longitud del documento en lucene 4.0

as I've read the documentation of the lucene 4.0, now this library stores some statistics as in order to compute different scoring models, one of them bm25. Is there a way, besides fetching a document, to fetch its length too?

preguntado el 09 de marzo de 12 a las 15:03

What is the length of document? The number of bytes/codepoints/fields? -

it is a number of terms, same length as used to compute BM25, I know this statistic exists in the Lucene 4, as otherwise bm25 computation wouldn't be possible, but I don't know how to fetch it? -

1 Respuestas

You can store whatever you want from FieldInvertState into the 'norm', and it doesn't have to be a 8 bit float either.

The default is a lossy storage of the length, if you want the actual exact length, maybe you choose to use a short (16bits) per document or something else instead.

See Similarity.computeNorm

respondido 09 mar '12, 16:03

can you give me an example on how to retrieve doc length? I don't quite understand your reply.. being a bit more specific would certainly be helpful for me. "See Similarity.computeNorm", see where? I'm using Lucene version 4.0 - Nik Kovac

should I compute this norm at index time and store it as a field in index, or I can retrieve the length without having to store anything? - Nik Kovac

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.