←back to thread

176 points Bogdanp | 1 comments | | HN request time: 0s | source
Show context
vintermann ◴[] No.45668278[source]
This is never going to work. The author is apparently against AI in search in favor of "simplicity", but this sort of thing

> Sentences are stemmed and POS-tagged. Sentences, with stemming and POS-tag data is fed into keyword extraction algorithms

IS AI, it's just old fashioned and bad AI. What he's trying will never work well, for the same reason rule-based machine translation never worked well: there are just too many rules and exceptions. Simplicity is great when you can have it, but with human language, simplicity was never on the table.

He's going to have to bite the bullet and use document embedding models sooner or later.

replies(1): >>45668988 #
1. marginalia_nu ◴[] No.45668988[source]
This code is just for helping identify document topics, it literally doesn't need to be perfect. Embedding a billion documents with a server that has no GPU is neither practical nor something that yields good results.