2025 AI Index Report

(hai.stanford.edu)

170 points INGELRII | 2 comments | 10 Apr 25 15:13 UTC | HN request time: 0.418s | source

Show context

mrdependable ◴[10 Apr 25 17:09 UTC] No.43645990[source]▶

I always see these reports about how much better AI is than humans now, but I can't even get it to help me with pretty mundane problem solving. Yesterday I gave Claude a file with a few hundred lines of code, what the input should be, and told it where the problem was. I tried until I ran out of credits and it still could not work backwards to tell me where things were going wrong. In the end I just did it myself and it turned out to be a pretty obvious problem.

The strange part with these LLMs is that they get weirdly hung up on things. I try to direct them away from a certain type of output and somehow they keep going back to it. It's like the same problem I have with Google where if I try to modify my search to be more specific, it just ignores what it doesn't like about my query and gives me the same output.

replies(4): >>43646008 #>>43646119 #>>43646496 #>>43647128 #

simonw ◴[10 Apr 25 17:11 UTC] No.43646008[source]▶

>>43645990 #

LLMs are difficult to use. Anyone who tells you otherwise is being misleading.

replies(2): >>43646190 #>>43666132 #

__loam ◴[10 Apr 25 17:30 UTC] No.43646190[source]▶

>>43646008 #

"Hey these tools are kind of disappointing"

"You just need to learn to use them right"

Ad infinitum as we continue to get middling results from the most overhyped piece of technology of all time.

replies(6): >>43646640 #>>43646655 #>>43646908 #>>43647257 #>>43652095 #>>43663510 #

pants2 ◴[10 Apr 25 19:26 UTC] No.43647257[source]▶

>>43646190 #

In my experience, most people who say "Hey these tools are kind of disappointing" either refuse to provide a reproducible example of how it falls short, or if they do, it's clear that they're not using the tool correctly.

replies(4): >>43647369 #>>43654440 #>>43654510 #>>43655733 #

sksxihve ◴[11 Apr 25 14:51 UTC] No.43654440[source]▶

>>43647257 #

I'd love to see a reproducible example of these tools producing something that is exceptional. Or a clear reproducible example of using them the right way.

I've used them some (sorry I didn't make detailed notes about my usage, probably used them wrong) but pretty much there are always subtle bugs that if I didn't know better I would have overlooked.

I don't doubt people find them useful, personally I'd rather spend my time learning about things that interest me instead of spending money learning how to prompt a machine to do something I can do myself that I also enjoy doing.

I think a lot of the disagreements on hn about this tech is that both sides are mostly on the extremes of either "it doesn't work and at and is pointless" or "it's amazing and makes me 100x more productive" and not much discussion about the mid-ground of it works for some stuff and knowing what stuff it works well on makes it useful but it won't solve all your problems.

replies(3): >>43656928 #>>43663543 #>>43664027 #

1. KronisLV ◴[12 Apr 25 11:41 UTC] No.43663543[source]▶

>>43654440 #

> I'd love to see a reproducible example of these tools producing something that is exceptional.

I’m happy that my standards are somewhat low, because the other day I used Claude Sonnet 3.7 to make me refactor around 70 source files and it worked out really nicely - with a bit of guidance along the way it got me a bunch of correctly architected interfaces and base/abstract classes and made the otherwise tedious task take much less time and effort, with a bit of cleanup and improvements along the way. It all also works okay, after the needed amount of testing.

I don’t need exceptional, I need meaningful productivity improvements that make the career less stressful and frustrating.

Historically, that meant using a good IDE. Along the way, that also started to mean IaC and containers. Now that means LLMs.

replies(1): >>43664482 #

2. ◴[12 Apr 25 14:04 UTC] No.43664482[source]▶

>>43663543 (TP) #

↑