2025 AI Index Report

(hai.stanford.edu)

170 points INGELRII | 3 comments | 10 Apr 25 15:13 UTC | HN request time: 0s | source

Show context

mrdependable ◴[10 Apr 25 17:09 UTC] No.43645990[source]▶

I always see these reports about how much better AI is than humans now, but I can't even get it to help me with pretty mundane problem solving. Yesterday I gave Claude a file with a few hundred lines of code, what the input should be, and told it where the problem was. I tried until I ran out of credits and it still could not work backwards to tell me where things were going wrong. In the end I just did it myself and it turned out to be a pretty obvious problem.

The strange part with these LLMs is that they get weirdly hung up on things. I try to direct them away from a certain type of output and somehow they keep going back to it. It's like the same problem I have with Google where if I try to modify my search to be more specific, it just ignores what it doesn't like about my query and gives me the same output.

replies(4): >>43646008 #>>43646119 #>>43646496 #>>43647128 #

slig ◴[10 Apr 25 17:23 UTC] No.43646119[source]▶

>>43645990 #

Was that on 3.7 Sonnet? I feel it's a lot worse than 3.5. If you can, try again but on Gemini 2.5.

replies(2): >>43646163 #>>43646188 #

avandekleut ◴[10 Apr 25 17:27 UTC] No.43646163[source]▶

>>43646119 #

I'm glad I'm not the only one that has found 3.5 to be better than 3.7.

replies(1): >>43646756 #

1. johnisgood ◴[10 Apr 25 18:26 UTC] No.43646756[source]▶

>>43646163 #

When did 3.7 come out? I might have had the same experience. I think I have been using 3.5 with success, but I cannot remember exactly. I may have not used 3.7 for coding (as I had a couple of months break).

replies(1): >>43647656 #

2. simonw ◴[10 Apr 25 20:16 UTC] No.43647656[source]▶

>>43646756 (TP) #

3.7 came out on 24th February. My notes from that release: https://simonwillison.net/2025/Feb/24/claude-37-sonnet-and-c... and https://simonwillison.net/2025/Feb/25/llm-anthropic-014/

replies(1): >>43647759 #

3. johnisgood ◴[10 Apr 25 20:29 UTC] No.43647759[source]▶

>>43647656 #

I will have to check, but apparently I have been using 3.5 with success, then. I will give 3.7 a try later, I hope it is really not that much worse, or is it? :(

↑