←back to thread

1901 points l2silver | 1 comments | | HN request time: 0.201s | source

Maybe you've created your own AR program for wearables that shows the definition of a word when you highlight it IRL, or you've built a personal calendar app for your family to display on a monitor in the kitchen. Whatever it is, I'd love to hear it.
Show context
PaulHoule ◴[] No.35729958[source]
Smart RSS reader that, right now, ingests about 1000 articles a day and picks out 300 for me to skim. Since I helped write this paper

https://arxiv.org/abs/cs/0312018

I was always asking "Why is RSS failing? Why do failing RSS readers keep using the same failing interface that keeps failing?" and thought that text classification was ready in 2004 for content-based recommendation, then I wrote

https://ontology2.com/essays/ClassifyingHackerNewsArticles/

a few years ago, after Twitter went south I felt like I had to do something, so I did. Even though my old logistic regression classifier works well, I have one based on MiniLM that outperforms it, and the same embedding makes short work of classification be it "cluster together articles about Ukraine, sports, deep learning, etc." over the last four months or "cluster together the four articles written about the same event in the last four days".

I am looking towards applying it to: images, sorting 5000+ search results on a topic, workflow systems (would this article be interesting to my wife, my son, hacker news?), and commercially interesting problems (is this person a good sales prospect?)

replies(10): >>35730396 #>>35730409 #>>35737702 #>>35738576 #>>35739040 #>>35739911 #>>35744103 #>>35750477 #>>35757291 #>>35762145 #
6510 ◴[] No.35738576[source]
I too have an (private) RSS "laboratory" project!

It isn't the elegant machinery you describe here as I'm quite unfamiliar with the technique you describe.

If I'm actively using it the feed list grows to about 35-40 000 at which point I find as many new feeds as I lose old ones.

I maintain a dozen categories of badwords, if any of those are in the headline it will be removed.

With many subscriptions things look quite different, higher frequency publishers start dominating the top of the newest list. The faster they publish the higher the standards I hold them to.

What is quite amazing is that some really terrible news websites use long titles that are highly descriptive. I have a good few of those, they get to stay around because the badword filter purges so much I hardly ever see them. For every 2000 bad ones business insider has a great article. It's a terrible website but their use of descriptive words in article titles is the best in the world.

The key insight imho is that the internet is much more of an echo chamber than people think.

As soon as you get rid of Musk and a few hundred other people, a few hundred companies, a dozen countries and a few thousand other topics you are left with a world of infinite other subjects. People are writing about stuff no one else ever thought of.

If everyone in the world is reading and writing about FOO it is absolutely amazing to get rid of FOO. There is no such thing as an important football match. (joking sorry)

Everyone is praising normality but you should really wonder who creates these norms. If they are good of bad people is besides the point. Musk says 1 something interesting per day I'm sure. For every 100 000 topics inserted into the collective we chose 1 then, by the tens of millions, we talk about it. Every day is Musk day.

It doesn't matter how hard you resist participating, eventually you will learn that space x launched a rocket. There is no avoiding it.

Autonomy is something fucking amazing. I imagine millions of articles are published per day. 99% things said before. What part should I want to read? The 1% with the most traffic?

You should get on the train to nowhere just like everyone else - they say. Stop wandering around on your own, you should get on the train just like me!

I'm not usually telling anyone not to get on the train. If people want to discuss "rss is dead" for the ten thousandth time, let them. They think they chose the topic themselves.

There is 13 billion years of history, 6000 sq km of earth, 7.9 billion people alive, 100 billion dead, 8.7 million species of plants and animals, 350 thousand chemical compounds, 130 million books since the printing press, 100 billion stars in the milky way alone. What to spend my time on? The Trump investigations? Really?

I'm sorry for not being very technical.

replies(3): >>35740673 #>>35748650 #>>35758836 #
1. PcChip ◴[] No.35740673[source]
Interested in your filters, or a link to your results!