AI agents: Less capability, more reliability, please

1. ramesh31 ◴[31 Mar 25 15:19 UTC] No.43536049[source]▶

More capability, less reliability please. I want something that can achieve superhuman results 1 out of 10 times, not something that gives mediocre human results 9 out of 10 times.

All of reality is probabilistic. Expecting that to map deterministically to solving open ended complex problems is absurd. It's vectors all the way down.

replies(5): >>43536239 #>>43536358 #>>43536391 #>>43537262 #>>43537360 #

2. soulofmischief ◴[31 Mar 25 15:35 UTC] No.43536239[source]▶

>>43536049 (TP) #

Stability is the bedrock of the evolution of stable systems. LLMs will not democratize software until an average person can get consistently decent and useful results without needing to be a senior engineer capable of a thorough audit.

replies(1): >>43536325 #

3. ramesh31 ◴[31 Mar 25 15:42 UTC] No.43536325[source]▶

>>43536239 #

>Stability is the bedrock of the evolution of stable systems.

So we also thought with AI in general, and spent decades toiling on rules based systems. Until interpretability was thrown out the window and we just started letting deep learning algorithms run wild with endless compute, and looked at the actual results. This will be very similar.

replies(3): >>43537298 #>>43537320 #>>43537703 #

4. deprave ◴[31 Mar 25 15:46 UTC] No.43536358[source]▶

>>43536049 (TP) #

What would be a superhuman result for booking a flight?

replies(1): >>43536585 #

5. Jianghong94 ◴[31 Mar 25 15:50 UTC] No.43536391[source]▶

>>43536049 (TP) #

Superhuman results 1/10 are, in fact, a very strong reliability guarantee (maybe not up to today's nth 9 decimal standard that we are accustomed to, but probably much higher than any agent in real-world workflow).

6. mjmsmith ◴[31 Mar 25 16:06 UTC] No.43536585[source]▶

>>43536358 #

10% of the time the seat on either side of you is empty, 90% of the time you land in the wrong country.

7. klabb3 ◴[31 Mar 25 17:06 UTC] No.43537262[source]▶

>>43536049 (TP) #

Reality is probabilistic yes but it’s not black box. We can improve our systems by understanding and addressing the flaws in our engineering. Do you want probabilistic black-box banking? Flight controls? Insurance?

”It works when it works” is fine when stakes are low and human is in the loop, like artwork for a blog post. And so in a way, I agree with you. AI doesn’t belong in intermediate computer-to-computer interactions, unless the stakes are low. What scares me is that the AI optimists are desperately looking to apply LLMs to domains and tasks where the cost of mistakes are high.

8. skydhash ◴[31 Mar 25 17:10 UTC] No.43537298{3}[source]▶

>>43536325 #

Rules based systems are quite useful, not for interacting with an untrained human, but for getting things done. Deep learning can be good at exploring the edges of a problem space, but when a solution is found, we can actually get to the doing part.

9. klabb3 ◴[31 Mar 25 17:13 UTC] No.43537320{3}[source]▶

>>43536325 #

This can be explained easily – there are simply some domains that were hard to model, and those are the ones where AI is outperforming humans. Natural language is the canonical example of this. Just because we focus on those domains now due to the recent advancements, doesn’t mean that AI will be better at every domain, especially the ones we understand exceptionally well. In fact, all evidence suggests that AI excels at some tasks and struggles with others. The null hypothesis should be that it continues to be the case, even as capability improves. Not all computation is the same.

10. recursive ◴[31 Mar 25 17:15 UTC] No.43537360[source]▶

>>43536049 (TP) #

> Expecting that to map deterministically to solving open ended complex problems is absurd.

TCP creates an abstraction layer with more reliability than what it's built on. If you can detect failure, you can create a retry loop, assuming you can understand the rules of the environment you're operating in.

replies(1): >>43541202 #

11. soulofmischief ◴[31 Mar 25 17:47 UTC] No.43537703{3}[source]▶

>>43536325 #

Stability and probability are orthogonal concepts. You can have stable probabilistic systems. Look no further than our own universe, where everything is ultimately probabilistic and not "rules-based".

12. ramesh31 ◴[31 Mar 25 23:34 UTC] No.43541202[source]▶

>>43537360 #

>If you can detect failure, you can create a retry loop, assuming you can understand the rules of the environment you're operating in

Indeed, this is what makes autonomous agentic tool using systems robust as well. Those retry loops become ad-hoc where needed, and the agent can self correct based on error responses, compared to a defined workflow that would get stuck in said loop if it couldn't figure things out, or just error out the whole process.