Project Vend: Can Claude run a small shop? (And why does that matter?)

1. due-rr ◴[27 Jun 25 20:27 UTC] No.44399955[source]▶

Would you ever trust an AI agent running your business? As hilarious as this small experiment is, is there ever a point where you can trust it to run something long term? It might make good decisions for a day, month or a year and then one day decide to trash your whole business.

replies(3): >>44400017 #>>44400031 #>>44400053 #

2. marinmania ◴[27 Jun 25 20:35 UTC] No.44400017[source]▶

>>44399955 (TP) #

It does seem far more straight forward to say "Write code that deterministically orders food items that people want and sends invoices etc."

I feel like that's more the future. Having an agent sorta make random choices feel like LLMs attempting to do math, instead of LLMs attempting to call a calculator.

replies(2): >>44400108 #>>44400297 #

3. keymon-o ◴[27 Jun 25 20:37 UTC] No.44400031[source]▶

>>44399955 (TP) #

I’ve just written a small anecdote with GPT3.5, where it lost count of some trivial item quantity incremental in just a few prompts. It might get better for the orders of magnitude from now on, but who’s gonna pay for ‘that one eventual mistake’.

replies(1): >>44400089 #

4. throwacct ◴[27 Jun 25 20:41 UTC] No.44400053[source]▶

>>44399955 (TP) #

I don't think any decision maker will let LLMs run their business. If the LLMs fail, you could potentially lose your livelihood.

5. croemer ◴[27 Jun 25 20:48 UTC] No.44400089[source]▶

>>44400031 #

GPT3.5? Did you mean to send this 2 years ago?

replies(1): >>44400146 #

6. keymon-o ◴[27 Jun 25 20:50 UTC] No.44400108[source]▶

>>44400017 #

Every output that is going to be manually verified by a professional is a safe bet.

People forget that we use computers for accuracy, not smarts. Smarts make mistakes.

7. keymon-o ◴[27 Jun 25 20:55 UTC] No.44400146{3}[source]▶

>>44400089 #

Maybe. Did LLMs stop with hallucinations and errors 2 years ago?

8. standardUser ◴[27 Jun 25 21:13 UTC] No.44400297[source]▶

>>44400017 #

Right, but if we limit the scope too much we quickly arrive at the point where 'dumb' autonomy is sufficient instead of using the world's most expensive algorithms.