> You do not need to.
Ranking means deciding which document (A or B) is better to return to the user when queried.
Not writing a traditional forward-algorithm to rank these documents implies one of the following:
- You write a "backward" algorithm (ML, regression, statistics, whatever you want to call it).
- You don't use algorithms to solve it. An army of humans chooses the rankings in real time.
- You don't rank documents at all.
> Counting how many links are pointing to each document is sufficient if you know how long that link existed
- Link-counting (e.g. PageRank) is query-independent evidence. If that's sufficient for you, you'll always return the same set of documents to each user, regardless of what they typed into the search box.
At best you've just added two more ranking factors to the mix:
- document A
qie:
length: 2Kb
misspellings: 14
age: 18 months
+ in-links: 4
+ in-link-spamminess: 2.31E4
qde:
matches 2 of your keywords exactly
matches a synonym of another of your keywords
- document B
qie:
length: 3Kb
misspellings: 7
age: 5 months
+ in-links: 2
+ in-link-spamminess: 2.54E3
qde:
matches 1 of your keywords exactly
matches 2 keywords by synonym
So I ask again:
- Which document matches your query better, A or B?
- How did you decide that, such that not only can you program a non-ML algorithm to perform the scoring, but you're certain enough of your decision that you can fix the algorithm when it disagrees with you ( >> debuggable and understandable by human search engineers )