Ask HN: Most interesting tech you built for just yourself?

1901 points l2silver | 1 comments | 27 Apr 23 15:04 UTC | HN request time: 0s | source

Maybe you've created your own AR program for wearables that shows the definition of a word when you highlight it IRL, or you've built a personal calendar app for your family to display on a monitor in the kitchen. Whatever it is, I'd love to hear it.

Show context

PaulHoule ◴[27 Apr 23 15:54 UTC] No.35729958[source]▶

>>35729232 (OP) #

Smart RSS reader that, right now, ingests about 1000 articles a day and picks out 300 for me to skim. Since I helped write this paper

https://arxiv.org/abs/cs/0312018

I was always asking "Why is RSS failing? Why do failing RSS readers keep using the same failing interface that keeps failing?" and thought that text classification was ready in 2004 for content-based recommendation, then I wrote

https://ontology2.com/essays/ClassifyingHackerNewsArticles/

a few years ago, after Twitter went south I felt like I had to do something, so I did. Even though my old logistic regression classifier works well, I have one based on MiniLM that outperforms it, and the same embedding makes short work of classification be it "cluster together articles about Ukraine, sports, deep learning, etc." over the last four months or "cluster together the four articles written about the same event in the last four days".

I am looking towards applying it to: images, sorting 5000+ search results on a topic, workflow systems (would this article be interesting to my wife, my son, hacker news?), and commercially interesting problems (is this person a good sales prospect?)

replies(10): >>35730396 #>>35730409 #>>35737702 #>>35738576 #>>35739040 #>>35739911 #>>35744103 #>>35750477 #>>35757291 #>>35762145 #

internetter ◴[27 Apr 23 16:20 UTC] No.35730409[source]▶

>>35729958 #

Do you have public source code for this? Looks great.

replies(1): >>35730844 #

PaulHoule ◴[27 Apr 23 16:44 UTC] No.35730844[source]▶

>>35730409 #

It's something I'm thinking about.

The system right now is highly reliable, I have no fear of doing a live demo of it, but live demos come off as strange because my feed is a strange mix of arXiv abstracts, Guardian articles about association football, etc. so it comes off as idiosyncratic and personal. (Oddly when I started this project I loved the NFL and hated the Premier League, when I started doing feature engineering as to "Why does it perform so well for arXiv papers and so poorly for sports" I started studying football articles in detail and started thinking "How would I feel if my team got relegated?" and "Wow, that game went 1-0 and it was an own goal" and next thing I knew I was hanging on every goal in every game Arsenal and Man City play -- it changed me.)

It's not even that hard for me to swap algorithms in and out but it should be easier, for instance I like the scikit-learn system for model selection mostly but there are some cases like SVC-P where I want to bypass it and I am not so sure how to comfortably fit fine-tuned transformer models into the system.

Another problem with it is that it depends on AWS Lambda and Suprfeeder for ingestion, it costs me less than $5 a month to run and about 10 cents per feed but (1) that's not cost-effective if I want to add a few hundred blogs like

https://www.righto.com/

and (2) I know many people hate AWS and other cloud services.

If somebody were interested in contributing some elbow grease that would help the case for open source, alternately a hosted demo of some kind would also be possible but I'm not ready to put my time and money into it. Contact me if you're interested in finding out more.

replies(1): >>35739028 #

1. rolisz ◴[28 Apr 23 08:57 UTC] No.35739028{3}[source]▶

>>35730844 #

> If somebody were interested in contributing some elbow grease that would help the case for open source,

Sent you an email! I've been wanting such an ML powered RSS reader for quite some time. I'd love to help make it open source if possible.

↑