←back to thread

1901 points l2silver | 1 comments | | HN request time: 0s | source

Maybe you've created your own AR program for wearables that shows the definition of a word when you highlight it IRL, or you've built a personal calendar app for your family to display on a monitor in the kitchen. Whatever it is, I'd love to hear it.
Show context
PaulHoule ◴[] No.35729958[source]
Smart RSS reader that, right now, ingests about 1000 articles a day and picks out 300 for me to skim. Since I helped write this paper

https://arxiv.org/abs/cs/0312018

I was always asking "Why is RSS failing? Why do failing RSS readers keep using the same failing interface that keeps failing?" and thought that text classification was ready in 2004 for content-based recommendation, then I wrote

https://ontology2.com/essays/ClassifyingHackerNewsArticles/

a few years ago, after Twitter went south I felt like I had to do something, so I did. Even though my old logistic regression classifier works well, I have one based on MiniLM that outperforms it, and the same embedding makes short work of classification be it "cluster together articles about Ukraine, sports, deep learning, etc." over the last four months or "cluster together the four articles written about the same event in the last four days".

I am looking towards applying it to: images, sorting 5000+ search results on a topic, workflow systems (would this article be interesting to my wife, my son, hacker news?), and commercially interesting problems (is this person a good sales prospect?)

replies(10): >>35730396 #>>35730409 #>>35737702 #>>35738576 #>>35739040 #>>35739911 #>>35744103 #>>35750477 #>>35757291 #>>35762145 #
internetter ◴[] No.35730409[source]
Do you have public source code for this? Looks great.
replies(1): >>35730844 #
PaulHoule ◴[] No.35730844[source]
It's something I'm thinking about.

The system right now is highly reliable, I have no fear of doing a live demo of it, but live demos come off as strange because my feed is a strange mix of arXiv abstracts, Guardian articles about association football, etc. so it comes off as idiosyncratic and personal. (Oddly when I started this project I loved the NFL and hated the Premier League, when I started doing feature engineering as to "Why does it perform so well for arXiv papers and so poorly for sports" I started studying football articles in detail and started thinking "How would I feel if my team got relegated?" and "Wow, that game went 1-0 and it was an own goal" and next thing I knew I was hanging on every goal in every game Arsenal and Man City play -- it changed me.)

It's not even that hard for me to swap algorithms in and out but it should be easier, for instance I like the scikit-learn system for model selection mostly but there are some cases like SVC-P where I want to bypass it and I am not so sure how to comfortably fit fine-tuned transformer models into the system.

Another problem with it is that it depends on AWS Lambda and Suprfeeder for ingestion, it costs me less than $5 a month to run and about 10 cents per feed but (1) that's not cost-effective if I want to add a few hundred blogs like

https://www.righto.com/

and (2) I know many people hate AWS and other cloud services.

If somebody were interested in contributing some elbow grease that would help the case for open source, alternately a hosted demo of some kind would also be possible but I'm not ready to put my time and money into it. Contact me if you're interested in finding out more.

replies(1): >>35739028 #
1. rolisz ◴[] No.35739028{3}[source]
> If somebody were interested in contributing some elbow grease that would help the case for open source,

Sent you an email! I've been wanting such an ML powered RSS reader for quite some time. I'd love to help make it open source if possible.