Language Support for Marginalia Search

This is never going to work. The author is apparently against AI in search in favor of "simplicity", but this sort of thing

> Sentences are stemmed and POS-tagged. Sentences, with stemming and POS-tag data is fed into keyword extraction algorithms

IS AI, it's just old fashioned and bad AI. What he's trying will never work well, for the same reason rule-based machine translation never worked well: there are just too many rules and exceptions. Simplicity is great when you can have it, but with human language, simplicity was never on the table.

He's going to have to bite the bullet and use document embedding models sooner or later.