←back to thread

615 points __rito__ | 3 comments | | HN request time: 0.002s | source

Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
Show context
Rperry2174 ◴[] No.46223267[source]
One thing this really highlights to me is how often the "boring" takes end up being the most accurate. The provocative, high-energy threads are usually the ones that age the worst.

If an LLM were acting as a kind of historian revisiting today’s debates with future context, I’d bet it would see the same pattern again and again: the sober, incremental claims quietly hold up, while the hyperconfident ones collapse.

Something like "Lithium-ion battery pack prices fall to $108/kWh" is classic cost-curve progress. Boring, steady, and historically extremely reliable over long horizons. Probably one of the most likely headlines today to age correctly, even if it gets little attention.

On the flip side, stuff like "New benchmark shows top LLMs struggle in real mental health care" feels like high-risk framing. Benchmarks rotate constantly, and “struggle” headlines almost always age badly as models jump whole generations.

I bet theres many "boring but right" takes we overlook today and I wondr if there's a practical way to surface them before hindsight does

replies(8): >>46223430 #>>46223589 #>>46224230 #>>46225719 #>>46226198 #>>46226204 #>>46226759 #>>46227922 #
yunwal ◴[] No.46223589[source]
"Boring but right" generally means that this prediction is already priced in to our current understanding of the world though. Anyone can reliably predict "the sun will rise tomorrow", but I'm not giving them high marks for that.
replies(3): >>46223658 #>>46223835 #>>46223965 #
onraglanroad ◴[] No.46223835[source]
I'm giving them higher marks than the people who say it won't.

LLMs have seen huge improvements over the last 3 years. Are you going to make the bet that they will continue to make similarly huge improvements, taking them well past human ability, or do you think they'll plateau?

The former is the boring, linear prediction.

replies(5): >>46223883 #>>46225283 #>>46225577 #>>46225588 #>>46242593 #
yunwal ◴[] No.46223883[source]
> Are you going to make the bet that they will continue to make similarly huge improvements

Sure yeah why not

> taking them well past human ability,

At what? They're already better than me at reciting historical facts. You'd need some actual prediction here for me to give you "prescience".

replies(4): >>46224023 #>>46224067 #>>46225640 #>>46226931 #
onraglanroad ◴[] No.46224023[source]
At every intellectual task.

They're already better than you at reciting historical facts. I'd guess they're probably better at composing poems (they're not great but far better than the average person).

Or you agree with me? I'm not looking for prescience marks, I'm just less convinced that people really make the more boring and obvious predictions.

replies(4): >>46224102 #>>46224264 #>>46227088 #>>46227926 #
1. yunwal ◴[] No.46224102[source]
What is an intellectual task? Once again, there's tons of stuff LLMs won't be trained on in the next 3 years. So it would be trivial to just find one of those things and say voila! LLMs aren't better than me at that.

I'll make one prediction that I think will hold up. No LLM-based system will be able to take a generic ask like "hack the nytimes website and retrieve emails and password hashes of all user accounts" and do better than the best hackers and penetration testers in the world, despite having plenty of training data to go off of. It requires out-of-band thinking that they just don't possess.

replies(1): >>46224425 #
2. hathawsh ◴[] No.46224425[source]
I'll take a stab at this: LLMs currently seem to be rather good at details, but they seem to struggle greatly with the overall picture, in every subject.

- If I want Claude Code to write some specific code, it often handles the task admirably, but if I'm not sure what should be written, consulting Claude takes a lot of time and doesn't yield much insight, where as 2 minutes with a human is 100x more valuable.

- I asked ChatGPT about some political event. It mirrored the mainstream press. After I reminded it of some obvious facts that revealed a mainstream bias, it agreed with me that its initial answer was wrong.

These experiences and others serve to remind me that current LLMs are mostly just advanced search engines. They work especially well on code because there is a lot of reasonably good code (and tutorials) out there to train on. LLMs are a lot less effective on intellectual tasks that humans haven't already written and published about.

replies(1): >>46226549 #
3. medler ◴[] No.46226549[source]
> it agreed with me that its initial answer was wrong.

Most likely that was just its sycophancy programming taking over and telling you what you wanted to hear