Most active commenters
  • simonw(7)
  • __loam(3)

←back to thread

2025 AI Index Report

(hai.stanford.edu)
166 points INGELRII | 31 comments | | HN request time: 0.002s | source | bottom
Show context
mrdependable ◴[] No.43645990[source]
I always see these reports about how much better AI is than humans now, but I can't even get it to help me with pretty mundane problem solving. Yesterday I gave Claude a file with a few hundred lines of code, what the input should be, and told it where the problem was. I tried until I ran out of credits and it still could not work backwards to tell me where things were going wrong. In the end I just did it myself and it turned out to be a pretty obvious problem.

The strange part with these LLMs is that they get weirdly hung up on things. I try to direct them away from a certain type of output and somehow they keep going back to it. It's like the same problem I have with Google where if I try to modify my search to be more specific, it just ignores what it doesn't like about my query and gives me the same output.

replies(4): >>43646008 #>>43646119 #>>43646496 #>>43647128 #
1. simonw ◴[] No.43646008[source]
LLMs are difficult to use. Anyone who tells you otherwise is being misleading.
replies(2): >>43646190 #>>43666132 #
2. __loam ◴[] No.43646190[source]
"Hey these tools are kind of disappointing"

"You just need to learn to use them right"

Ad infinitum as we continue to get middling results from the most overhyped piece of technology of all time.

replies(6): >>43646640 #>>43646655 #>>43646908 #>>43647257 #>>43652095 #>>43663510 #
3. simonw ◴[] No.43646640[source]
That's why I try not to hype it.
replies(2): >>43649582 #>>43652701 #
4. tzumaoli ◴[] No.43646655[source]
also "They will get better in no time"
replies(1): >>43646686 #
5. simonw ◴[] No.43646686{3}[source]
That one's provably correct. Try comparing 2023-era GPT-3.5 with 2025's best models.
replies(1): >>43650254 #
6. torginus ◴[] No.43646908[source]
LLMs are a casino. They're probabilistic models which might come up with incredible solutions at a drop of a hat, then turn around and fumble even the most trivial stuff - I've had this same experience from GPT3.5 to the latest and greatest models.

They come up with something amazing once, and then never again, leading me to believe, it's operator error, not pure dumb luck or slight prompt wording that lead me to be humbled once, and then tear my hair out in frustration the next time.

Granted, newer models tend to do more hitting than missing, but it's still far from a certainty that it'll spit out something good.

7. pants2 ◴[] No.43647257[source]
In my experience, most people who say "Hey these tools are kind of disappointing" either refuse to provide a reproducible example of how it falls short, or if they do, it's clear that they're not using the tool correctly.
replies(4): >>43647369 #>>43654440 #>>43654510 #>>43655733 #
8. __loam ◴[] No.43647369{3}[source]
Ad infinitum
9. mvdtnz ◴[] No.43649582{3}[source]
You're the biggest hype merchant for this technology on this entire website. Please.
replies(2): >>43649742 #>>43655396 #
10. simonw ◴[] No.43649742{4}[source]
I've been banging the drum about how unintuitive and difficult this stuff is for over a year now: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

I'm one of the loudest voices about the so-far unsolved security problems inherent in this space: https://simonwillison.net/tags/prompt-injection/ (94 posts)

I also have 149 posts about the ethics of it: https://simonwillison.net/tags/ai-ethics/ - including one of the first high profile projects to explore the issue around copyrighted data used in training sets: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

One of the reasons I do the "pelican riding a bicycle" thing is that it's a great way to deflate the hype around these tools - the supposedly best LLM in the world still draws a pelican that looks like it was done by a five year old! https://simonwillison.net/tags/pelican-riding-a-bicycle/

If you want AI hype there are a thousand places on the internet you can go to get it. I try not to be one of them.

replies(3): >>43651102 #>>43653084 #>>43660423 #
11. xboxnolifes ◴[] No.43650254{4}[source]
It's not provably correct if the comment is made toward 2025 models.
replies(1): >>43650548 #
12. simonw ◴[] No.43650548{5}[source]
Gemini 2.5 came out just over two weeks ago (25th March) and is a very significant improvement on Gemini 2.0 (5th February), according to a bunch of benchmarks but also the all-important vibes.
13. __loam ◴[] No.43651102{5}[source]
The prompt injection articles you wrote really early in the tech cycle were really good and I appreciated them at the time.
14. TeMPOraL ◴[] No.43652095[source]
No, it's just you and yours.

IDK, maybe there's a secret conspiracy of major LLM providers to split users into two groups, one that gets the good models, and the other that gets the bad models, and ensure each user is assigned to the same bucket at every provider.

Surely it's more likely that you and me got put into different buckets by the Deep LLM Cartel I just described, than it is for you to be holding the tool wrong.

15. JohnKemeny ◴[] No.43652701{3}[source]
Uh... You don't do anything but hype them.

I literally don't know who anyone on HN are except you and dang, and you're the one that constantly writes these ads for your LLM database product.

replies(1): >>43652811 #
16. simonw ◴[] No.43652811{4}[source]
I think you and I must have different definitions of the word "hype".

To me, it means LinkedIn influencers screaming "AGI is coming!", "It's so over", "Programming as a career is dead" etc.

Or implying that LLMs are flawless technology that can and should be used to solve every problem.

To hype something is to provide a dishonest impression of how great it is without ever admitting its weaknesses. That's what I try to avoid doing with LLMs.

replies(1): >>43659344 #
17. andai ◴[] No.43653084{5}[source]
Could a five year old do it in XML (SVG)? Could an artist? In one shot?
18. sksxihve ◴[] No.43654440{3}[source]
I'd love to see a reproducible example of these tools producing something that is exceptional. Or a clear reproducible example of using them the right way.

I've used them some (sorry I didn't make detailed notes about my usage, probably used them wrong) but pretty much there are always subtle bugs that if I didn't know better I would have overlooked.

I don't doubt people find them useful, personally I'd rather spend my time learning about things that interest me instead of spending money learning how to prompt a machine to do something I can do myself that I also enjoy doing.

I think a lot of the disagreements on hn about this tech is that both sides are mostly on the extremes of either "it doesn't work and at and is pointless" or "it's amazing and makes me 100x more productive" and not much discussion about the mid-ground of it works for some stuff and knowing what stuff it works well on makes it useful but it won't solve all your problems.

replies(3): >>43656928 #>>43663543 #>>43664027 #
19. mickael-kerjean ◴[] No.43654510{3}[source]
The latest example for me was trying to generate a thumbnail of a PSD in plain C and figure out the layers in there as I was lazy to read the specs, with the objective to bundle it as a wasm and execute it on a browser, it never got to extract a thumbnail from a given PSD, it's very confident at making stuff but it never got anywhere despite spending a couple hours on it which would have been better spend reading specs and existing code on that topic
20. maleldil ◴[] No.43655396{4}[source]
It's true that simonw writes a lot about LLMs, but I find his content to be mostly factual. Much of it is positive, but that doesn't mean it's hype.
21. input_sh ◴[] No.43655733{3}[source]
How are we supposed to give a reproducible example with a non-deterministic tool?
22. doug_durham ◴[] No.43656928{4}[source]
Why are you setting the bar at "exceptional". If it means that you can write your git commit messages more quickly and with fewer errors then that's all the payoff most orgs need to make them worthwhile.
replies(1): >>43661377 #
23. bluefirebrand ◴[] No.43659344{5}[source]
> without ever admitting its weaknesses

I don't think this part is necessary

"To hype something is to provide a dishonest impression of how great it is" is accurate.

Marketing hype is all about "provide a dishonest impression of how great it is". Putting the weaknesses in fine print doesn't change the hype

Anyways I don't mean to pile on but I agree with some of the other posters here. An awful lot of extremely pro-AI posts that I've noticed have your name on them

I don't think you are as critical of the tech as you think you are.

Take that for what you will

24. annjose ◴[] No.43660423{5}[source]
I agree - the content you write about LLMs is informative and realistic, not hyped. I get a lot of value from it, especially because you write mostly as stream of consciousness and explains your approach and/or reasoning. Thank you for doing that.
25. bluefirebrand ◴[] No.43661377{5}[source]
> Why are you setting the bar at "exceptional"

Because that is how they are being sold to us and hyped

> If it means that you can write your git commit messages more quickly and with fewer errors then that's all the payoff most orgs need to make them worthwhile.

This is so trivial that it wouldn't even be worth looking into, it's basically zero value

26. KronisLV ◴[] No.43663510[source]
> "Hey these tools are kind of disappointing"

> "You just need to learn to use them right"

Admittedly, the first line is also my reaction to the likes of ASM or system level programming languages (C, C++, Rust…) because they can be unpleasant and difficult to use when compared to something that’d let me iterate more quickly (Go, Python, Node, …) for certain use cases.

For example, building a CLI tool in Go vs C++. Or maybe something to shuffle some data around and handle certain formatting in Python vs Rust. Or a GUI tool with Node/Electron vs anything else.

People telling me to RTFM and spend a decade practicing to use them well wouldn’t be wrong though, because you can do a lot with those tools, if you know how to use them well.

I reckon that it applies to any tool, even LLMs.

27. KronisLV ◴[] No.43663543{4}[source]
> I'd love to see a reproducible example of these tools producing something that is exceptional.

I’m happy that my standards are somewhat low, because the other day I used Claude Sonnet 3.7 to make me refactor around 70 source files and it worked out really nicely - with a bit of guidance along the way it got me a bunch of correctly architected interfaces and base/abstract classes and made the otherwise tedious task take much less time and effort, with a bit of cleanup and improvements along the way. It all also works okay, after the needed amount of testing.

I don’t need exceptional, I need meaningful productivity improvements that make the career less stressful and frustrating.

Historically, that meant using a good IDE. Along the way, that also started to mean IaC and containers. Now that means LLMs.

replies(1): >>43664482 #
28. xrraptr ◴[] No.43664027{4}[source]
I honestly think the problem is you are just a lot smarter than I am.

I find these tools wonderful but I am a lazy, college drop out of the most average intelligence, a very shitty programmer who would never get paid to write code.

I am intellectually curious though and these tools help me level up closer to someone like you.

Of course, if I had 30 more IQ points I wouldn't need these tools but I don't have 30 more IQ points.

29. ◴[] No.43664482{5}[source]
30. zamadatix ◴[] No.43666132[source]
I also think LLMs are more difficult to use for most tasks than is often flouted myself but I don't really jive with statements like "Anyone who tells you otherwise is being misleading". Most of the time I find they are just using them in a very different capacity.
replies(1): >>43666396 #
31. simonw ◴[] No.43666396[source]
I intended those words to imply "being misleading even if they don't know they are being misleading" - I made a better version of that point here: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

> If someone tells you that coding with LLMs is easy they are (probably unintentionally) misleading you. They may well have stumbled on to patterns that work, but those patterns do not come naturally to everyone.