Small language models are the future of agentic AI

(arxiv.org)

Show context

bryant ◴[01 Jul 25 06:27 UTC] No.44431158[source]▶

A few weeks ago, I processed a product refund with Amazon via agent. It was simple, straightforward, and surprisingly obvious that it was backed by a language model based on how it responded to my frustration about it asking tons of questions. But in the end, it processed my refund without ever connecting me with a human being.

I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for.

Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too.

replies(5): >>44431884 #>>44431916 #>>44432173 #>>44433836 #>>44441923 #

1. automatic6131 ◴[01 Jul 25 08:42 UTC] No.44431884[source]▶

>>44431158 #

You can (used to?) get a refund on Amazon with normal CRUD app flow. Putting an SLM and a conversational interface over it is a backwards step.

replies(3): >>44432408 #>>44432734 #>>44433817 #

2. oblio ◴[01 Jul 25 10:24 UTC] No.44432408[source]▶

>>44431884 (TP) #

From our perspective as users. From the company's perspective? Net positive, they don't need to hire people.

We're going to be so messed up in a decade or so when only 10-20-30% of the population is employable in decent jobs.

People keep harping on about people moving on with their lives, but people don't. Many industrial heartlands in the developed world are wastelands compared to what they were: Walloonia in Belgium, Scotland in the UK, the Rust Belt in the US.

People don't really move on, they suffer, sometimes for generations.

replies(1): >>44432502 #

3. thatjoeoverthr ◴[01 Jul 25 10:42 UTC] No.44432502[source]▶

>>44432408 #

A CRUD flow is the actual automation, which was already digested into the economy by 2005 or so. PHP is not a guy in the back who types HTML really fast when you click a button :)

The LLM, here, is the opposite; additional human labor to build the integrations, additional capital for chips, heavy cost of inference, an additional skeuomorphic UI (it self identifies as a chat/texting situation) and your wasted time. I would almost call it "make work".

4. sillytwice ◴[01 Jul 25 11:21 UTC] No.44432734[source]▶

>>44431884 (TP) #

Recently I sent a product on guarantee to Amazon for reparation using a tag label and paying 42€, and the next day they cancelled the label of the product (they are investigating why) and the product was rejected in the Amazon store. Now, following indications, I have to wait the product to come back and resend it paying another 42 € that they promise to refund me later. The product is an air conditioned system for an old woman, here in South Spain the temperature is very high (a hot wave), and I think it will take a long time to be repaired (just to match up with their repairing service described below).

The product maker cecotec, from whom I hope never to buy a product from, uses the following repairing process: you have to create an account to submit your data and then when you try to login into the caretaker page the system announces that the account you created does not exist (I have tried several times, several days, both with mine and my wife email and personal data). Furthermore there is no way someone should take the telephone on cecotec.

Another step that failed is to ask the Spanish postal service to give me some option to try to send again the product from its current location, stopped expecting sender order, to Amazon Storage avoiding the product come back step: they informed me that they can not respond to my email because the privacy law forbid it, perhaps this is because the product was send with my wife name and address. I don't suppose the air conditioned system to have some personal information attached to it.

Buddishm says that you can learn from any experience and maybe the karma of this product is to never be repaired and the fate has decided to condemn the old lone woman to suffer the hot wave, perhaps to expire some past life bad karma.

Fortunately all that comes goes by, and in this case I am happy to be able affording to buy a new machine that produces cold air, so, kind reader, be quiet there is no real problem. Furthermore, this experience can expand my empathy: it could be that for some people life don't work as it should, for them this anecdote is the normal course of actions where one problem calls another. For those a stream of problems is the only repl. To those I, most sincerely, wish peace of mind and hope their fortune reverse.

In this anecdote or episode human intelligence is not producing correct results and that could create a hope: that those small LLM models enhanced with intelligent agents could provide better support.

Today, in my current mental state I envision that the contrary could occur: those system could convert the bad things that happens once into the bad things that happens every day. So be careful with what you wish.

My hope it is that the greatest agents, ourselves, get together to solve whatever problem we have to cope with. But don't fool yourself that require real human deep intelligence and human hard work.

replies(1): >>44434552 #

5. DebtDeflation ◴[01 Jul 25 13:38 UTC] No.44433817[source]▶

>>44431884 (TP) #

Likewise, we've been building conversational interfaces for stuff like this for over a decade using traditional NLP techniques and orchestration rather than LLMs. Intent classification, named entity extraction, slot filling, and API calling.

Processing returns is pretty standardized - identify the order and the item within it being returned, capture the reason for the return, check eligibility, capture whether they want a refund or a new item, and execute either. When you have a fully deterministic workflow with 5-6 steps, maybe 1 or 2 if-then branches, and then a single action at the end, I don't see the value of running an LLM in a loop, burning a crazy amount of tokens, and hoping it works at least 80% of the time when there are far simpler and cheaper ways of doing it that will work almost 100% of the time.

replies(1): >>44434002 #

6. zsyllepsis ◴[01 Jul 25 14:00 UTC] No.44434002[source]▶

>>44433817 #

True, we have been building conversational interfaces with traditional NLP. In my experience, they’ve been fairly fragile.

Extending the example you gave, nicely packaged, fully deterministic workflows work great in demos. Then customers start going off the paved path. They ask about returning 3 items all at once, or a whole order. They get confused and provide a shipping number instead of an order number. They switch language part of the way through the conversation because they get frustrated by all these follow-up questions.

All of these absolutely can be handled through traditional NLP, but require system designers to account for them, model the conversation, and design their system accordingly to react accordingly. And suddenly the 5-6 step deterministic workflows with a couple of if-branches… isn’t.

7. sillytwice ◴[01 Jul 25 14:59 UTC] No.44434552[source]▶

>>44432734 #

Solving the stream of problems with a repl (read-eval-print-loop) is like in python:

   while life: 
     problem = read(bureaucracy, bad_luck, bad_ai)
     response = eval(problem, resources = few)
     print(response) # Spoiler '404 Solution not found'

replies(1): >>44436688 #

8. kridsdale3 ◴[01 Jul 25 18:19 UTC] No.44436688{3}[source]▶

>>44434552 #

Try not to crash on line 3.

↑