←back to thread

302 points simonw | 1 comments | | HN request time: 0.202s | source
Show context
luke-stanley ◴[] No.41877416[source]
I'm glad this worked for Simon, but I would probably prefer using a User Script that scrapes DOM text changes and streams them to a small local web server to append to a JSONL file that has the URL, text change and timestamp. Probably since I already have something doing this, it allows me to do a backup of things I'm looking at in real time, like streaming LLM generations, and it just relies on normal browser technology. I should probably share my code since it's quite useful. I'm a bit uncomfortable relying on a LLM to transcribe something where there is a stream of text that could be used in a robust way, and with real data, Vs well trained but indirect token magic. A middle ground might be to have grounded extraction and evidence chains, with timestamps, screenshots, cropped regions it's sourcing from, spelled out reasoning. There's the extraction / retrieval step and there's a kind of data normalisation. Of course, it's nice that he's got something that just works with two or three steps, it's good the technology is getting quite reliable and cheap a lot of the time, but still, we could do better.
replies(3): >>41880801 #>>41890864 #>>41895929 #
1. simonw ◴[] No.41880801[source]
I did something a bit like that recently to scrape tweets out of Twitter: https://til.simonwillison.net/twitter/collecting-replies