Project Vend: Can Claude run a small shop? (And why does that matter?)

(www.anthropic.com)

277 points gk1 | 2 comments | 27 Jun 25 16:09 UTC | HN request time: 0.015s | source

Show context

rossdavidh ◴[27 Jun 25 21:02 UTC] No.44400209[source]▶

Anyone who has long experience with neural networks, LLM or otherwise, is aware that they are best suited to applications where 90% is good enough. In other words, applications where some other system (human or otherwise) will catch the mistakes. This phrase: "It is not entirely clear why this episode occurred..." applies to nearly every LLM (or other neural network) error, which is why it is usually not possible to correct the root cause (although you can train on that specific input and a corrected output).

For some things, like say a grammar correction tool, this is probably fine. For cases where one mistake can erase the benefit of many previous correct responses, and more, no amount of hardware is going to make LLM's the right solution.

Which is fine! No algorithm needs to be the solution to everything, or even most things. But much of people's intuition about "AI" is warped by the (unmerited) claims in that name. Even as LLM's "get better", they won't get much better at this kind of problem, where 90% is not good enough (because one mistake can be very costly), and problems need discoverable root causes.

replies(4): >>44401352 #>>44401613 #>>44402343 #>>44406687 #

1. petetnt ◴[28 Jun 25 01:05 UTC] No.44401613[source]▶

>>44400209 #

The only job in the world where 90% success rate is acceptable is telemarketing and thah has been run by bots since the 90s.

replies(1): >>44408735 #

2. rossdavidh ◴[28 Jun 25 22:31 UTC] No.44408735[source]▶

>>44401613 (TP) #

LLM's will definitely find big uses in spam. However, it's not the _only_ use.

1) the code that LLMs give you in response to a prompt may not actually work anywhere close to 90% of the time, but when they get 90% of the work done, that is still a clear win (if a human debugs it).

2) in cases where the benefit from successes is as much as the potential downside from failures (e.g. something that suggests possible improvements to your writing), then 90% success rate is great

3) in cases where the end recipient understands that the end product is not reliable, for example product reviews, then something that scans and summarizes a bunch of reviews is fine; people know that reviews aren't gospel

But, advocates of LLMs want to use them for what they most want, not for what LLMs are best at, and therein lies the problem, one which has been the root cause of every "AI winter" in the past.

↑