Most active commenters

Jensson(16)
ben_w(8)
sebastiennight(4)
Closi(3)

Popular/hot comments

>>43746742 #

←back to thread

Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

Show context

mellosouls ◴[20 Apr 25 17:44 UTC] No.43745240[source]▶

>>43744173 (OP) #

The capabilities of AI post gpt3 have become extraordinary and clearly in many cases superhuman.

However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.

What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".

This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.

"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.

replies(13): >>43745398 #>>43745959 #>>43746159 #>>43746204 #>>43746319 #>>43746355 #>>43746427 #>>43746447 #>>43746522 #>>43746657 #>>43746801 #>>43749837 #>>43795216 #

jjeaff ◴[20 Apr 25 19:33 UTC] No.43745959[source]▶

>>43745240 #

I suspect AGI will be one of those things that you can't describe it exactly, but you'll know it when you see it.

replies(7): >>43746043 #>>43746058 #>>43746080 #>>43746093 #>>43746651 #>>43746728 #>>43746951 #

1. NitpickLawyer ◴[20 Apr 25 19:47 UTC] No.43746058[source]▶

>>43745959 #

> but you'll know it when you see it.

I agree, but with the caveat that it's getting harder and harder with all the hype / doom cycles and all the goalpost moving that's happening in this space.

IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.

replies(4): >>43746116 #>>43746460 #>>43746560 #>>43746705 #

2. Jensson ◴[20 Apr 25 19:54 UTC] No.43746116[source]▶

>>43746058 (TP) #

> IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.

No they wouldn't, since those still can't replace human white collar workers even at many very basic tasks.

Once AGI is here most white collar jobs are gone, you'd only need to hire geniuses at most.

replies(1): >>43746249 #

3. zaptrem ◴[20 Apr 25 20:16 UTC] No.43746249[source]▶

>>43746116 #

Which part of "General Intelligence" requires replacing white collar workers? A middle schooler has general intelligence (they know about and can do a lot of things across a lot of different areas) but they likely can't replace white collar workers either. IMO GPT-3 was AGI, just a pretty crappy one.

replies(2): >>43746254 #>>43746322 #

4. Jensson ◴[20 Apr 25 20:17 UTC] No.43746254{3}[source]▶

>>43746249 #

> A middle schooler has general intelligence (they know about and can do a lot of things across a lot of different areas) but they likely can't replace white collar workers either.

Middle schoolers replace white collars workers all the time, it takes 10 years for them to do it but they can do it.

No current model can do the same since they aren't able to learn over time like a middle schooler.

replies(1): >>43746692 #

5. ◴[20 Apr 25 20:30 UTC] No.43746322{3}[source]▶

>>43746249 #

6. bayarearefugee ◴[20 Apr 25 20:54 UTC] No.43746460[source]▶

>>43746058 (TP) #

There's no way to be sure in either case, but I suspect their impressions of the technology ten or twenty years ago would be not so different from my experience of first using LLMs a few years ago...

Which is to say complete amazement followed quickly by seeing all the many ways in which it absolutely falls flat on its face revealing the lack of actual thinking, which is a situation that hasn't fundamentally changed since then.

replies(1): >>43748573 #

7. mac-mc ◴[20 Apr 25 21:10 UTC] No.43746560[source]▶

>>43746058 (TP) #

When it can replace a polite, diligent, experienced 120 IQ human in all tasks. So it has a consistent long-term narrative memory, doesn't "lose the plot" as you interact longer and longer with it, can pilot robots to do physical labor without much instruction (what is current state of the art is not that, a trained human will still do much better, can drive cars, etc), generate images without goofy non-human style errors, etc.

replies(1): >>43746787 #

8. sebastiennight ◴[20 Apr 25 21:36 UTC] No.43746692{4}[source]▶

>>43746254 #

Compared to someone who graduated middle school on November 30th, 2022 (2.5 years ago, would you say that today's gemini 2.5 pro has NOT gained intelligence faster?

I mean, if you're a CEO or middle manager and you have the choice of hiring this middle schooler for general office work, or today's gemini-2.5-pro, are you 100% saying the ex-middle-schooler is definitely going to give you best bang for your buck?

Assuming you can either pay them $100k a year, or spend the $100k on gemini inference.

replies(1): >>43746742 #

9. sebastiennight ◴[20 Apr 25 21:39 UTC] No.43746705[source]▶

>>43746058 (TP) #

I don't think so, and here's my simple proof:

You and I could sit behind a keyboard, role-playing as the AI in a reverse Turing test, typing away furiously at the top of our game, and if you told someone that their job is to assess our performance (thinking they're interacting with a computer), they would still conclude that we are definitely not AGI.

This is a battle that can't be won at any point because it's a matter of faith for the forever-skeptic, not facts.

replies(1): >>43746759 #

10. Jensson ◴[20 Apr 25 21:47 UTC] No.43746742{5}[source]▶

>>43746692 #

> would you say that today's gemini 2.5 pro has NOT gained intelligence faster?

Gemini 2.5 pro the model has not gained any intelligence since it is a static model.

New models are not the models learning, it is humans creating new models. The models trained has access to all the same material and knowledge a middle schooler has as they go on to learn how to do a job, yet they fail to learn the job while the kid succeeds.

replies(3): >>43747033 #>>43747197 #>>43749355 #

11. Jensson ◴[20 Apr 25 21:50 UTC] No.43746759[source]▶

>>43746705 #

> I don't think so, and here's my simple proof:

That isn't a proof since you haven't ran that test, it is just a thought experiment.

replies(1): >>43747137 #

12. NitpickLawyer ◴[20 Apr 25 21:55 UTC] No.43746787[source]▶

>>43746560 #

> experienced 120 IQ human in all tasks.

Well, that's 91th percentile already. I know the terms are hazy, but that seems closer to ASI than AGI from that perspective, no?

I think I do agree with you on the other points.

replies(2): >>43747163 #>>43748221 #

13. ben_w ◴[20 Apr 25 22:38 UTC] No.43747033{6}[source]▶

>>43746742 #

> Gemini 2.5 pro the model has not gained any intelligence since it is a static model.

Surely that's an irrelevant distinction, from the point of view of a hiring manager?

If a kid takes ten years from middle school to being worth hiring, then the question is "what new AI do you expect will exist in 10 years?"

How the model comes to be, doesn't matter. Is it a fine tune on more training data from your company docs and/or an extra decade of the internet? A different architecture? A different lab in a different country?

Doesn't matter.

Doesn't matter for the same reason you didn't hire the kid immediately out of middle school, and hired someone else who had already had another decade to learn more in the meantime.

Doesn't matter for the same reason that different flesh humans aren't perfectly substitutable.

You pay to solve a problem, not to specifically have a human solve it. Today, not in ten years when today's middle schooler graduates from university.

And that's even though I agree that AI today doesn't learn effectively from as few examples as humans need.

replies(1): >>43749385 #

14. ben_w ◴[20 Apr 25 22:54 UTC] No.43747137{3}[source]▶

>>43746759 #

I've been accused a few times of being an AI, even here.

(Have you not experienced being on the recieving end of such accusations? Or do I just write weird?)

I think this demonstrates the same point.

replies(1): >>43749414 #

15. ben_w ◴[20 Apr 25 22:59 UTC] No.43747163{3}[source]▶

>>43746787 #

Indeed, on both. Even IQ 85 would make a painful dent in the economy via unemployment statistics. But the AI we have now is spikey, in ways that make it trip up over mistakes even slighly below average humans would not make, even though it can also do Maths Olympiad puzzles, the bar exam, leetcode, etc.

16. ac29 ◴[20 Apr 25 23:06 UTC] No.43747197{6}[source]▶

>>43746742 #

> Gemini 2.5 pro the model has not gained any intelligence since it is a static model.

Aren't all the people interacting with it on aistudio helping the next Gemini model learn though?

Sure, the results of that wont be available until the next model is released, but it seems to me that human interaction/feedback is actually a vital part of LLM training.

replies(1): >>43749395 #

17. mac-mc ◴[21 Apr 25 03:07 UTC] No.43748221{3}[source]▶

>>43746787 #

The emotional way that humans think when buying products is similarly unfair. Only the 90th percentile is truly 'satisfactory'. The implied question is when would Joe Average and everyone else stop moving the goal posts to the question, "Do we have AI yet"?

ASI is, by definition, Superintelligence, which means it is beyond practical human IQ capacity. So something like 200 IQ.

Again, you might call it 'unfair', but that's when it will also stop having goal posts being moved; otherwise, Joe Midwit will call it 'it's only as smart as some smart dudes I know'.

18. HdS84 ◴[21 Apr 25 04:47 UTC] No.43748573[source]▶

>>43746460 #

Yes, thar is the same feelingg I have. Giving it some json and describe how a website should look? Super fast results and amazing capabilities. Trying to get it to translate my unit tests from Xunit to Tunit, where the latter is new and does not have a ton of blog posts? Forget about it. The process is purely mechanical and it is easy after RTFM, but it falls flat on its face

replies(1): >>43750395 #

19. sebastiennight ◴[21 Apr 25 07:49 UTC] No.43749355{6}[source]▶

>>43746742 #

This argument needlessly anthropomorphizes the models. They are not humans nor living entities, they are systems.

So, fine, the gemini-2.5-pro model hasn't gotten more intelligent. What about the "Google AI Studio API" as a system? Or the "OpenAI chat completions API" as a system?

This system has definitely gotten vastly smarter based on the input it's gotten. Would you now concede, that if we look at the API-level (which, by the way, is the way you as the employer do interact with it) then this entity has gotten smarter way faster than the middle-schooler in the last 2.5 years?

replies(1): >>43749376 #

20. Jensson ◴[21 Apr 25 07:53 UTC] No.43749376{7}[source]▶

>>43749355 #

But its the AI researchers that made it smarter, it isn't a self contained system like a child. If you fired the people maintaining it and it just interacted with people it would stop improving.

replies(2): >>43750455 #>>43752312 #

21. Jensson ◴[21 Apr 25 07:55 UTC] No.43749385{7}[source]▶

>>43747033 #

> Surely that's an irrelevant distinction, from the point of view of a hiring manager?

Stop moving the goalposts closer, that you think humans might make an AGI in the future doesn't mean the current AI is an AGI just because it uses the same interface.

replies(1): >>43749669 #

22. Jensson ◴[21 Apr 25 07:56 UTC] No.43749395{7}[source]▶

>>43747197 #

It wont get smart enough without the researchers making architectural updates though, current architecture wont learn to become a white collar worker just from user feedback.

23. Jensson ◴[21 Apr 25 08:02 UTC] No.43749414{4}[source]▶

>>43747137 #

> Have you not experienced being on the recieving end of such accusations?

No, I have not been accused of being an AI. I have seen people who format their texts get accused due to the formatting sometimes, and thought people could accuse me for the same reason, but that doesn't count.

> I think this demonstrates the same point.

You can't detect general intelligence from a single message, so it doesn't really. People accuse you for being an AI based on the structure and word usage of your message, not the content of it.

replies(1): >>43749635 #

24. ben_w ◴[21 Apr 25 08:41 UTC] No.43749635{5}[source]▶

>>43749414 #

> People accuse you for being an AI based on the structure and word usage of your message, not the content of it.

If that's the real cause, it is not the reason they give when making the accusation. Sometimes they object to the citations, sometimes the absence of them.

But it's fairly irrelevant, as they are, in fact, saying that real flesh-and-blood me doesn't pass their purity test for thinking.

Is that because they're not thinking? Doesn't matter — as @sebastiennight said: "This is a battle that can't be won at any point because it's a matter of faith for the forever-skeptic, not facts."

replies(1): >>43749898 #

25. ben_w ◴[21 Apr 25 08:46 UTC] No.43749669{8}[source]▶

>>43749385 #

Your own comment was a movement of the goalposts.

Preceding quotation to which you objected:

> A middle schooler has general intelligence (they know about and can do a lot of things across a lot of different areas) but they likely can't replace white collar workers either.

Your response:

> Middle schoolers replace white collars workers all the time, it takes 10 years for them to do it but they can do it.

So I could rephrase your own words here as "Stop moving the goalposts closer, that you think a middle schooler might become a General Intelligence in the future doesn't mean the current middle schooler is a General Intelligence just because they use the same name".

replies(1): >>43749800 #

26. Jensson ◴[21 Apr 25 09:06 UTC] No.43749800{9}[source]▶

>>43749669 #

Its the same middle schooler, nobody gave a time limit for how long it takes the middle schooler to solve the problem. These AI models wont solve it no matter how much time spent, you have to make new models, like making new kids.

Put one of these models in a classroom with middle schoolers, and make it go through all the same experiences, they still wont replace a white collar worker.

> a middle schooler might become a General Intelligence in the future

Being able to learn anything a human can means you are a general intelligence now. Having a skill is narrow intelligence, being able to learn is what we mean with general intelligence. No current model has demonstrated the ability to learn arbitrary white collar jobs, so no current model has done what it takes to be considered a general intelligence. The biological model homo sapiens have demonstrated that ability, thus we call homo sapiens generally intelligent.

replies(1): >>43756845 #

27. Jensson ◴[21 Apr 25 09:25 UTC] No.43749898{6}[source]▶

>>43749635 #

So is your argument is that all skeptics are unreasonable people that can't ever be convinced based on being called an AI once? Don't you see who is the unreasonable one here?

There are always people that wont admit they are wrong, but most people do come around when presented with overwhelming evidence, it has happened many times in history and most people switches to new technology very quickly when its good enough.

28. Closi ◴[21 Apr 25 10:48 UTC] No.43750395{3}[source]▶

>>43748573 #

Although I think if you asked people 20 years ago to describe a test for something AGI would do, they would be more likely to say “writing a poem” or “making art” than “turning Xunit code to Tunit”

IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.

LLMs to me are still a technology of pure science fiction come to life before our eyes!

replies(1): >>43750445 #

29. Jensson ◴[21 Apr 25 10:54 UTC] No.43750445{4}[source]▶

>>43750395 #

Tell them humans need to babysit it and doublecheck its answers to do anything since it isn't as reliable as a human then no they wouldn't call it an AGI back then either.

The whole point about AGI is that it is general like a human, if it has such glaring weaknesses as the current AI has it isn't AGI, it was the same back then. That an AGI can write a poem doesn't mean being able to write a poem makes it an AGI, its just an example the AI couldn't do 20 years ago.

replies(1): >>43750479 #

30. sebastiennight ◴[21 Apr 25 10:55 UTC] No.43750455{8}[source]▶

>>43749376 #

1. The child didn't learn algebra on its own either. Aside from Blaise Pascal, most children learned those skills by having experienced humans teach them.

2. How likely is it that we're going to fire everyone maintaining those models in the next 7.5 years?

replies(1): >>43750596 #

31. Closi ◴[21 Apr 25 10:59 UTC] No.43750479{5}[source]▶

>>43750445 #

Why do human programmers need code review then if they are intelligent?

And why can’t expert programmers deploy code without testing it? Surely they should just be able to write it perfectly first time without errors if they were actually intelligent.

replies(1): >>43750524 #

32. Jensson ◴[21 Apr 25 11:04 UTC] No.43750524{6}[source]▶

>>43750479 #

> Why do human programmers need code review then if they are intelligent?

Human programmers don't need code reviews, they can test things themselves. Code reviews is just an optimization to scale up it isn't a requirement to make programs.

Also the AGI is allowed to let another AGI code review it, the point is there shouldn't be a human in the loop.

> And why can’t expert programmers deploy code without testing it?

Testing it can be done by themselves, the AGI model is allowed to test its own things as well.

replies(1): >>43750556 #

33. Closi ◴[21 Apr 25 11:08 UTC] No.43750556{7}[source]▶

>>43750524 #

Well AGI can write unit tests, write application code then run the tests and iterate - agents in cursor are doing this already.

Just not for more complex applications.

Code review does often find bugs in code…

Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

I also think it’s quite a “programmer mentality” that most of the tests in this forum about if something is/isn’t AGI ultimately boils down to if it can write bug-free code, rather than if it can negotiate or sympathise or be humerous or write an engaging screen play… I’m not saying AGI is good at those things yet, but it’s interesting that we talk about the test of AGI being transpiling code rather than understanding philosophy.

replies(1): >>43750831 #

34. Jensson ◴[21 Apr 25 11:14 UTC] No.43750596{9}[source]▶

>>43750455 #

> The child didn't learn algebra on its own either. Aside from Blaise Pascal, most children learned those skills by having experienced humans teach them.

That is them interacting with an environment. We don't go and rewire their brain to make them learn math.

If you made an AI that we can put in a classroom and it learns everything needed to do any white collar job that way then it is an AGI. Of course just like a human different jobs would mean it needs different classes, but just like a human you can still make them learn anything.

> How likely is it that we're going to fire everyone maintaining those models in the next 7.5 years?

If you stop making new models? Zero chance the model will replace such high skill jobs. If not? Then that has nothing to do with whether current models are general intelligences.

replies(1): >>43754392 #

35. Jensson ◴[21 Apr 25 11:46 UTC] No.43750831{8}[source]▶

>>43750556 #

> Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

But the AI still can't replace you, it doesn't learn as it go and therefore fail to navigate long term tasks the way humans do. When a human writes a big program he learns how to write it as he writes it, these current AI cannot do that.

replies(1): >>43754367 #

36. ben_w ◴[21 Apr 25 14:19 UTC] No.43752312{8}[source]▶

>>43749376 #

The brain of a child is not self-contained either. Neither is the entire complete child themselves — "It takes a village to raise a child", to quote the saying.

The entire reason we have a mandatory education system that doesn't stop with middle school (for me, middle school ended age 11), is that it's a way to improve kids.

37. int_19h ◴[21 Apr 25 17:31 UTC] No.43754367{9}[source]▶

>>43750831 #

Strictly speaking, it can, but its ability to do so is limited by its context size.

Which keeps growing - Gemini is at 2 million tokens now, which is several books worth of text.

Note also that context is roughly the equivalent of short-term memory in humans, while long-term memory is more like RAG.

38. int_19h ◴[21 Apr 25 17:33 UTC] No.43754392{10}[source]▶

>>43750596 #

Your brain does rewire itself as you learn.

Here's a question for you. If we take a model with open weights - say, LLaMA or Qwen - and give it access to learning materials as well as tools to perform training runs on its weights and dynamically reload those updated weights - would that constitute learning, to you? If not, then why not?

replies(1): >>43756933 #

39. ben_w ◴[21 Apr 25 21:44 UTC] No.43756845{10}[source]▶

>>43749800 #

> Its the same middle schooler, nobody gave a time limit for how long it takes the middle schooler to solve the problem.

Yeah they do. If a middle schooler take 40 hours to solve a maths exam, they fail the exam.

> These AI models wont solve it no matter how much time spent, you have to make new models, like making new kids.

First: doesn't matter, "white collar jobs" aren't companies aren't paying for seat warmers, they're paying for problems solved, and not the kinds of problems 11 year olds can do.

Second: So far as I can tell, every written exam that not only 11 year olds but even 16 year olds take, and in many cases 21 year olds take, LLMs ace — the problem is coming up with new tests that describe the stuff we want that models can't do which humans can. This means that while I even agree these models have gaps, I can't actually describe those gaps in a systematic way, they just "vibe" like my own experience of continuing to misunderstand German as a Brit living in Berlin.

Third: going from 11 years old to adulthood, most or all atoms in your body will be replaced, and your brain architecture changes significantly. IIRC something like half of synapses get pruned by puberty.

Fourth: Taking a snapshot of a model and saying that snapshot can't learn, is like taking a sufficiently detailed MRI scan of a human brain and saying the same thing about the human you've imaged — training cut-offs are kinda arbitrary.

> No current model has demonstrated the ability to learn arbitrary white collar jobs, so no current model has done what it takes to be considered a general intelligence.

Both "intelligence" and "generality" are continuums, not booleans. It's famously hard for humans to learn new languages as they get older, for example.

All AI (not just LLMs) need a lot more experience than me, which means my intelligence is higher. When sufficient traing data exists, that doesn't matter because the AI can just make up for being stupid by being stupid really fast — which is how they can read and write in more languages than I know the names of.

On the other hand, LLMs so far have demonstrated — at the junior level of a fresh graduate of 21, let alone an 11 year old — demonstrated algebra, physics, chemistry, literature, coding, a hundred or so languages, medicine, law, politics, marketing, economics, and customer support. That's pretty general. Even if "fresh graduate" isn't a high standard for employment.

It took reading a significant fraction of the internet to get to that level because of their inefficiency, but they're superhumanly general, "Jack of all trades, master of none".

Well, superhuman compared to any individual. LLM generality only seems mediocre when compared to the entire human species at once, these models vastly exceed any single human because no single human speaks as many languages as these things let alone all the other stuff.

replies(1): >>43756967 #

40. Jensson ◴[21 Apr 25 21:53 UTC] No.43756933{11}[source]▶

>>43754392 #

> Here's a question for you. If we take a model with open weights - say, LLaMA or Qwen - and give it access to learning materials as well as tools to perform training runs on its weights and dynamically reload those updated weights - would that constitute learning, to you? If not, then why not?

It does constitute learning, but it wont make it smart since it isn't intelligent about its learning like human brains are.

41. Jensson ◴[21 Apr 25 21:58 UTC] No.43756967{11}[source]▶

>>43756845 #

I think you are off topic here. You agree these models can't replace those humans, hence you agree they aren't AGI, the rest of your post somehow got into whether companies would hire 11 year olds or not.

Point is if we had models as smart as a 10 year old, we could put that model through school and then it would be able to do white collar jobs like a 25 year old. But no model can do that, hence the models aren't as smart as 10 year olds, since the biggest part to being smart is being able to learn.

So until we have a model that can do those white collar jobs, we know they aren't as generally smart as 10 year olds since they can't replicate the same learning process. If they could replicate the learning process then we would and we would have that white collar worker.

replies(1): >>43757118 #

42. ben_w ◴[21 Apr 25 22:17 UTC] No.43757118{12}[source]▶

>>43756967 #

Reread it, I edit stuff while composing, and hadn't finished until at least 13 minutes after your comment.

Employability is core issue, as you brought up white collar worker comparison:

"""No they wouldn't, since those still can't replace human white collar workers even at many very basic tasks.

Once AGI is here most white collar jobs are gone, you'd only need to hire geniuses at most.""" - https://news.ycombinator.com/item?id=43746116

Key thing you likely didn't have in comment you replied to: G and I are not bool.

↑