←back to thread

2025 AI Index Report

(hai.stanford.edu)
166 points INGELRII | 3 comments | | HN request time: 0s | source
Show context
mrdependable ◴[] No.43645990[source]
I always see these reports about how much better AI is than humans now, but I can't even get it to help me with pretty mundane problem solving. Yesterday I gave Claude a file with a few hundred lines of code, what the input should be, and told it where the problem was. I tried until I ran out of credits and it still could not work backwards to tell me where things were going wrong. In the end I just did it myself and it turned out to be a pretty obvious problem.

The strange part with these LLMs is that they get weirdly hung up on things. I try to direct them away from a certain type of output and somehow they keep going back to it. It's like the same problem I have with Google where if I try to modify my search to be more specific, it just ignores what it doesn't like about my query and gives me the same output.

replies(4): >>43646008 #>>43646119 #>>43646496 #>>43647128 #
slig ◴[] No.43646119[source]
Was that on 3.7 Sonnet? I feel it's a lot worse than 3.5. If you can, try again but on Gemini 2.5.
replies(2): >>43646163 #>>43646188 #
avandekleut ◴[] No.43646163[source]
I'm glad I'm not the only one that has found 3.5 to be better than 3.7.
replies(1): >>43646756 #
1. johnisgood ◴[] No.43646756[source]
When did 3.7 come out? I might have had the same experience. I think I have been using 3.5 with success, but I cannot remember exactly. I may have not used 3.7 for coding (as I had a couple of months break).
replies(1): >>43647656 #
2. simonw ◴[] No.43647656[source]
3.7 came out on 24th February. My notes from that release: https://simonwillison.net/2025/Feb/24/claude-37-sonnet-and-c... and https://simonwillison.net/2025/Feb/25/llm-anthropic-014/
replies(1): >>43647759 #
3. johnisgood ◴[] No.43647759[source]
I will have to check, but apparently I have been using 3.5 with success, then. I will give 3.7 a try later, I hope it is really not that much worse, or is it? :(