←back to thread

302 points simonw | 5 comments | | HN request time: 0.348s | source
1. luke-stanley ◴[] No.41877416[source]
I'm glad this worked for Simon, but I would probably prefer using a User Script that scrapes DOM text changes and streams them to a small local web server to append to a JSONL file that has the URL, text change and timestamp. Probably since I already have something doing this, it allows me to do a backup of things I'm looking at in real time, like streaming LLM generations, and it just relies on normal browser technology. I should probably share my code since it's quite useful. I'm a bit uncomfortable relying on a LLM to transcribe something where there is a stream of text that could be used in a robust way, and with real data, Vs well trained but indirect token magic. A middle ground might be to have grounded extraction and evidence chains, with timestamps, screenshots, cropped regions it's sourcing from, spelled out reasoning. There's the extraction / retrieval step and there's a kind of data normalisation. Of course, it's nice that he's got something that just works with two or three steps, it's good the technology is getting quite reliable and cheap a lot of the time, but still, we could do better.
replies(3): >>41880801 #>>41890864 #>>41895929 #
2. simonw ◴[] No.41880801[source]
I did something a bit like that recently to scrape tweets out of Twitter: https://til.simonwillison.net/twitter/collecting-replies
3. ranger_danger ◴[] No.41890864[source]
The userscript idea is great, I could think of some uses for this, such as text-to-speech for live comments. Do you know of any examples of projects already doing this?
replies(1): >>41893644 #
4. ian_hn ◴[] No.41893644[source]
Things like this? https://greasyfork.org/en/scripts?q=speech
5. luke-stanley ◴[] No.41895929[source]
I put my Userscript and Python server script here: https://gist.github.com/lukestanley/c3a37ab61a45e72b74995a5c... It tries to save as Markdown. I'm sure it could be much better in many ways. But it works well enough for me right now.