←back to thread

283 points rrampage | 2 comments | | HN request time: 0.48s | source
1. MPSimmons ◴[] No.42195353[source]
Does anyone know if the average document length mentioned in the document length normalization is median? It seems like it would need to be to properly deweight excessively long documents, otherwise the excessively long documents would unfairly weight the average, right?
replies(1): >>42195523 #
2. softwaredoug ◴[] No.42195523[source]
It’s the mean. At least in Lucene. Using median would be an interesting experiment.

Do you know of a search dataset with very large document length differences? MSMarco for example is pretty consistent in length.