Project Vend: Can Claude run a small shop? (And why does that matter?)

(www.anthropic.com)

277 points gk1 | 2 comments | 27 Jun 25 16:09 UTC | HN request time: 0.411s | source

Show context

rossdavidh ◴[27 Jun 25 21:02 UTC] No.44400209[source]▶

Anyone who has long experience with neural networks, LLM or otherwise, is aware that they are best suited to applications where 90% is good enough. In other words, applications where some other system (human or otherwise) will catch the mistakes. This phrase: "It is not entirely clear why this episode occurred..." applies to nearly every LLM (or other neural network) error, which is why it is usually not possible to correct the root cause (although you can train on that specific input and a corrected output).

For some things, like say a grammar correction tool, this is probably fine. For cases where one mistake can erase the benefit of many previous correct responses, and more, no amount of hardware is going to make LLM's the right solution.

Which is fine! No algorithm needs to be the solution to everything, or even most things. But much of people's intuition about "AI" is warped by the (unmerited) claims in that name. Even as LLM's "get better", they won't get much better at this kind of problem, where 90% is not good enough (because one mistake can be very costly), and problems need discoverable root causes.

replies(4): >>44401352 #>>44401613 #>>44402343 #>>44406687 #

bigstrat2003 ◴[27 Jun 25 23:54 UTC] No.44401352[source]▶

>>44400209 #

This is an insightful post, and I think maybe highlights the gap between AI proponents and me (very skeptical about AI claims). I don't have any applications where I'm willing to accept 90% as good enough. I want my tools to work 100% of the time or damn close to it, and even 90% simply is not acceptable in my book. It seems like maybe the people who are optimistic about AI simply are willing to accept a higher rate of imperfections than I am.

replies(3): >>44401977 #>>44401995 #>>44402048 #

1. beering ◴[28 Jun 25 02:36 UTC] No.44401977[source]▶

>>44401352 #

It’s not hard to find applications where 90% success or even 50% success rate is incredibly useful. For example, hooking up ChatGPT Codex to your repo and asking it to find and fix a bug. If it succeeds in 50% of the attempts, you would hit that button over and over until its success rate drops to a much lower figure. Especially as costs trend towards zero.

replies(1): >>44402969 #

2. 3vidence ◴[28 Jun 25 07:52 UTC] No.44402969[source]▶

>>44401977 (TP) #

I agree there are good examples of 90% being good enough but what you purposed doesn't sound like a good one.

This assumes that AI can't also introduce new bugs into the code causing a negative.

A case of 90% being good enough sound more like story boarding or giving note summaries.

↑