←back to thread

302 points simonw | 5 comments | | HN request time: 0.406s | source
1. etewiah ◴[] No.41869821[source]
You've got me thinking. Would this work for real estate data? A lot of sites make it quite hard to grab their raw data. Also, perhaps it could gain some insights from the photos...
replies(4): >>41870309 #>>41874563 #>>41891567 #>>41893599 #
2. simonw ◴[] No.41870309[source]
I'm certain it would. That would be a really fun experiment to run!
3. jerpint ◴[] No.41874563[source]
Could also work for social media which can be hard to scrape
4. TechDebtDevin ◴[] No.41891567[source]
Been scraping real estate data off every major real estate site for a while. They practically give away their data, there's zero reason to introduce an added cost for llms.

Sure you could do this, and it would work, but you'd spend about 100000x what I do with a $10 Hetzner VPS and a small amount of proxy bandwidth.

5. bambax ◴[] No.41893599[source]
It's crazy to think we live in a world where video to llm ocr is simpler (and cheaper?) than plain old html parsing. Maybe someone will rebuild the Twitter API like this?!?