Adaptive LLM routing under budget constraints

1. andrewflnr ◴[01 Sep 25 17:48 UTC] No.45094933[source]▶

Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.

replies(7): >>45094979 #>>45094995 #>>45095059 #>>45095198 #>>45095374 #>>45095383 #>>45095463 #

2. srekhi ◴[01 Sep 25 17:54 UTC] No.45094979[source]▶

>>45094933 (TP) #

I'm not following this either. You'd think this would be frontier back in 2023

3. kenjackson ◴[01 Sep 25 17:55 UTC] No.45094995[source]▶

>>45094933 (TP) #

First, I don't think we will ever get to AGI. Not because we won't see huge advances still, but AGI is a moving ambiguous target that we won't get consensus on.

But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.

replies(3): >>45095489 #>>45096115 #>>45099679 #

4. guluarte ◴[01 Sep 25 18:03 UTC] No.45095059[source]▶

>>45094933 (TP) #

I'm starting to think that there will not be an 'AGI' moment, we will simply slowly build smarter machines over time until we realize there is 'AGI'. It would be like video calls in the '90s everybody wanted them, now everybody hates them, lmao.

replies(1): >>45095803 #

5. jibal ◴[01 Sep 25 18:18 UTC] No.45095198[source]▶

>>45094933 (TP) #

LLMs are not on the road to AGI, but there are plenty of dangers associated with them nonetheless.

replies(2): >>45095419 #>>45095531 #

6. yahoozoo ◴[01 Sep 25 18:37 UTC] No.45095374[source]▶

>>45094933 (TP) #

That and LLMs are seemingly plateauing. Earlier this year, it seemed like the big companies were releasing noticeable improvements every other week. People would joke a few weeks is “an eternity” in AI…so what time span are we looking at now?

replies(3): >>45095505 #>>45095842 #>>45096023 #

7. yieldcrv ◴[01 Sep 25 18:38 UTC] No.45095383[source]▶

>>45094933 (TP) #

just because it’s on arxiv doesn’t mean anything

arxiv is essentially a blog under an academic format, popular amongst asian and south asian academic communities

currently you can launder reputation with it, just like “white papers” in the crypto world allowed for capital for some time

this ability will diminish as more people catch on

replies(1): >>45101494 #

8. nicce ◴[01 Sep 25 18:42 UTC] No.45095419[source]▶

>>45095198 #

Just 2 days ago Gemini 2.5 Pro tried to recommend me tax evasion based on non-existing laws and court decisions. The model was so charming and convincing, that even after I brought all the logic flaws and said that this is plain wrong, I started to doubt myself, because it is so good at pleasing, arguing and using words.

And most would have accept the recommendation because the model sold it as less common tactic, while sounding very logical.

replies(2): >>45095524 #>>45095785 #

9. ctoth ◴[01 Sep 25 18:47 UTC] No.45095463[source]▶

>>45094933 (TP) #

Is a random paper from Fujitsu Research claiming to be the frontier of anything?

replies(1): >>45095541 #

10. _heimdall ◴[01 Sep 25 18:49 UTC] No.45095489[source]▶

>>45094995 #

So you don't expect AGI to be possible ever? Or is your concern mainly with the wildly different definitions people use for it and that we'll continue moving goal posts rather than agree we got there?

replies(1): >>45095729 #

11. andrewflnr ◴[01 Sep 25 18:51 UTC] No.45095505[source]▶

>>45095374 #

That's just the thing. There don't seem to have been any breakthroughs in model performance or architecture, so it seems like we're back to picking up marginal reductions in cost to make any progress.

12. roywiggins ◴[01 Sep 25 18:54 UTC] No.45095524{3}[source]▶

>>45095419 #

> even after I brought all the logic flaws and said that this is plain wrong

Once you've started to argue with an LLM you're already barking up the wrong tree. Maybe you're right, maybe not, but there's no point in arguing it out with an LLM.

replies(1): >>45096510 #

13. andrewflnr ◴[01 Sep 25 18:54 UTC] No.45095531[source]▶

>>45095198 #

Agreed, broadly. I never really thought they were, but seeing people work on stuff like this instead of even trying to improve the architecture really makes it obvious.

14. andrewflnr ◴[01 Sep 25 18:56 UTC] No.45095541[source]▶

>>45095463 #

Not just this paper, but model working shenanigans also seem to have been a big part of GPT-5, which certainly claims to be frontier work.

15. nutjob2 ◴[01 Sep 25 19:19 UTC] No.45095729{3}[source]▶

>>45095489 #

There's no concrete evidence AGI is possible mostly because it has no concrete definition.

It's mostly hand waving, hype and credulity, and unproven claims of scalability right now.

You can't move the goal posts because they don't exist.

replies(3): >>45096070 #>>45097847 #>>45101489 #

16. nutjob2 ◴[01 Sep 25 19:25 UTC] No.45095785{3}[source]▶

>>45095419 #

Or you could understand the tool you are using and be skeptical of any of its output.

So many people just want to believe, instead of the reality of LLMs being quite unreliable.

Personally it's usually fairly obvious to me when LLMs are bullshitting probably because I have lots of experience detecting it in humans.

replies(1): >>45096499 #

17. nutjob2 ◴[01 Sep 25 19:26 UTC] No.45095803[source]▶

>>45095059 #

Or we'll realize that human intelligence and machine intelligence is apple and oranges.

18. muldvarp ◴[01 Sep 25 19:31 UTC] No.45095842[source]▶

>>45095374 #

There have been very large improvements in code generation in the last 6 months. A few weeks without improvement are not necessarily a plateau.

replies(1): >>45096061 #

19. ◴[01 Sep 25 19:53 UTC] No.45096023[source]▶

>>45095374 #

20. ACCount37 ◴[01 Sep 25 19:57 UTC] No.45096061{3}[source]▶

>>45095842 #

Wait until it ramps up so much that people will say "it's a plateau, for real this time" when they go 3 days without a +10% capability jump.

replies(1): >>45096195 #

21. ashirviskas ◴[01 Sep 25 19:57 UTC] No.45096070{4}[source]▶

>>45095729 #

Well, if a human is GI, we just need to make it Artificial. Easy.

replies(1): >>45098866 #

22. ACCount37 ◴[01 Sep 25 20:02 UTC] No.45096115[source]▶

>>45094995 #

I can totally see "it's not really AGI because it doesn't consistently outperform those three top 0.000001% outlier human experts yet if they work together".

It'll be a while until the ability to move the goalposts of "actual intelligence" is exhausted entirely.

replies(1): >>45096696 #

23. muldvarp ◴[01 Sep 25 20:10 UTC] No.45096195{4}[source]▶

>>45096061 #

I mean I wish there were a plateau, without one we're well onto our way into techno-feudalism. I just don't see it.

replies(1): >>45096748 #

24. nicce ◴[01 Sep 25 20:52 UTC] No.45096499{4}[source]▶

>>45095785 #

LLM is only useful if it gives shortcut to information with reasonable accuracy. If I need to double check everything, it is just extra step.

In this case I just happened to be domain expert and knew it was wrong. It would have required significant effort to verify everything with some less experienced person.

25. nicce ◴[01 Sep 25 20:53 UTC] No.45096510{4}[source]▶

>>45095524 #

There are cases when they are actually correct, instead of the human.

replies(1): >>45096538 #

26. roywiggins ◴[01 Sep 25 20:56 UTC] No.45096538{5}[source]▶

>>45096510 #

Yes, and there's a substantial chance they'll apologize to you anyway even when they were right. There's no reason to expect them to be more likely to apologize when they're actually right vs actually wrong- their agreeableness is really orthogonal to their correctness.

replies(1): >>45096612 #

27. nicce ◴[01 Sep 25 21:07 UTC] No.45096612{6}[source]▶

>>45096538 #

Yes, they over-apologize. But my main reason for using LLMs is seeking out things that I missed myself or my own argumentation was not good. Sometimes they are really good at bringing new perspectives. Whether they are correct or incorrect is not the point - are they giving argument or perspective that is worth inspecting more with my own brains?

28. 9dev ◴[01 Sep 25 21:17 UTC] No.45096696{3}[source]▶

>>45096115 #

Well right now, my niece of 7 years outperforms all LLM contenders in drawing a Pelican on a bicycle

replies(2): >>45097149 #>>45103451 #

29. ACCount37 ◴[01 Sep 25 21:24 UTC] No.45096748{5}[source]▶

>>45096195 #

That's what it is: wishful thinking. A lot of people really, really want AI tech to fail - because they don't like the alternative.

replies(1): >>45099262 #

30. kenjackson ◴[01 Sep 25 22:25 UTC] No.45097149{4}[source]▶

>>45096696 #

I know this was a joke, but LLMs are quite good at this now. If your niece draws better then she’s a good artist.

31. _heimdall ◴[02 Sep 25 00:22 UTC] No.45097847{4}[source]▶

>>45095729 #

Got it, and yeah I agree with you there. I've been frustrated by a different view of it though, many people seem to have a definition and they are often wildly different.

32. abalashov ◴[02 Sep 25 03:31 UTC] No.45098866{5}[source]▶

>>45096070 #

I like to say that it's not AI -- it's just A.

33. muldvarp ◴[02 Sep 25 04:51 UTC] No.45099262{6}[source]▶

>>45096748 #

Yeah, obviously nobody that actually though about the consequences wants a large part of the population to become unemployed. Even if your job is not threatened by automation, it will be threatened by a lot of people looking for new jobs.

And the kind of automation brought by LLMs is decidely different than automation in the past which almost always created new (usually better) jobs. LLMs won't do this (at least to extent where it would matter) I think. Most people in ten years will have worse jobs (more physically straining, longer hours, less pay) unless there will be a political intervention.

34. baq ◴[02 Sep 25 06:16 UTC] No.45099679[source]▶

>>45094995 #

Given OpenAI definition I’d expect AGI to be around in a decade or two. I don’t expect skynet, though maybe it’s a more realistic vision outcome that just droids mixing with humans.

35. dahcryn ◴[02 Sep 25 11:13 UTC] No.45101489{4}[source]▶

>>45095729 #

even AI does not have a concrete definition.

Doesn't mean there aren't practical definitions depending on the context.

In essence, teaching an AI using recources meant for humans, and nothing more, would be considered AGI. That could be a practical definition, without needing much more rigour.

There is indeed no evidence we'll get there. But there is also no evidence LLM's should work as well as they do

36. dahcryn ◴[02 Sep 25 11:14 UTC] No.45101494[source]▶

>>45095383 #

arxiv should really have a big red banner "NOT REVIEWED - DON'T USE AS A SOURCE" or something

37. neuronexmachina ◴[02 Sep 25 14:18 UTC] No.45103451{4}[source]▶

>>45096696 #

I tried it in Gemini just now, it seems to have done a decent job: https://g.co/gemini/share/b6fef8398c01