https://arxiv.org/abs/cs/0312018
I was always asking "Why is RSS failing? Why do failing RSS readers keep using the same failing interface that keeps failing?" and thought that text classification was ready in 2004 for content-based recommendation, then I wrote
https://ontology2.com/essays/ClassifyingHackerNewsArticles/
a few years ago, after Twitter went south I felt like I had to do something, so I did. Even though my old logistic regression classifier works well, I have one based on MiniLM that outperforms it, and the same embedding makes short work of classification be it "cluster together articles about Ukraine, sports, deep learning, etc." over the last four months or "cluster together the four articles written about the same event in the last four days".
I am looking towards applying it to: images, sorting 5000+ search results on a topic, workflow systems (would this article be interesting to my wife, my son, hacker news?), and commercially interesting problems (is this person a good sales prospect?)