←back to thread

564 points nimbusega | 1 comments | | HN request time: 0.236s | source
Show context
nimbusega ◴[] No.42067000[source]
I made this to experiment with embeddings and explore how different ways of displaying information affect your perception.

It gets the top 100 stories, sends their html to GPT-4 to extract the main content (this was not producing good enough results with html parsing) and then gets an embedding using the title and content.

Likes/dislikes are stored in local storage and compared against all stories using cosine similarity to find the most relevant stories.

It costs about $10/day to run. I was thinking of offering additional value for a small subscription. Maybe more pages of the newspaper, full story content/comments, a weekly digest or ePub export or something?

replies(4): >>42067307 #>>42067813 #>>42072116 #>>42072371 #
ketzo ◴[] No.42067307[source]
I think some of the highest value from HN comes from the comments, and it's much harder to find the "best" ones, since they might be in threads you might not have otherwise read.

Not sure if it's a "premium feature" so to speak, but would be very cool to extend this to comments generally.

replies(2): >>42067598 #>>42077536 #
nimbusega ◴[] No.42067598[source]
Definitely, comments are usually better than the article. I thought of a 'Letters to the Editors' section that shows top comments (https://news.ycombinator.com/bestcomments) and references the parent story, but it might not be as useful without the context.

Maybe 'See Comments' here could load the comments on the same page? In a newspaper like style.

replies(1): >>42073635 #
1. genewitch ◴[] No.42073635[source]
AI should be able to do "good enough" sentiment analysis combined with the "votes" should be able to quickly find agree/disagree and the quality of the comment - which should not be based merely on the number of complex words, or the length.

i certainly suspect that the 4chan and reddit datasets, combined with HN's, and building a LoRA that ranks the 4chan and reddit stuff lower and the good HN stuff higher. essentially, subtract all reddit and 4chan style comments from the set of HN comments' weights. Training SD loras was pretty quick but i haven't looked into LLM loras. regardless, the LLM with the HN-4chan&reddit can do sentiment analysis and use the votes; just feed it csv or json: votes, user, comment. I guess you could do votes/age as a cleanup, too.

All this to say i still wouldn't read or use it. I'm not a fan of robots entertaining me.