Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

265 points ctoth | 5 comments | 20 Apr 25 14:55 UTC | HN request time: 1.066s | source

Show context

mellosouls ◴[20 Apr 25 17:44 UTC] No.43745240[source]▶

The capabilities of AI post gpt3 have become extraordinary and clearly in many cases superhuman.

However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.

What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".

This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.

"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.

replies(13): >>43745398 #>>43745959 #>>43746159 #>>43746204 #>>43746319 #>>43746355 #>>43746427 #>>43746447 #>>43746522 #>>43746657 #>>43746801 #>>43749837 #>>43795216 #

jjeaff ◴[20 Apr 25 19:33 UTC] No.43745959[source]▶

>>43745240 #

I suspect AGI will be one of those things that you can't describe it exactly, but you'll know it when you see it.

replies(7): >>43746043 #>>43746058 #>>43746080 #>>43746093 #>>43746651 #>>43746728 #>>43746951 #

NitpickLawyer ◴[20 Apr 25 19:47 UTC] No.43746058[source]▶

>>43745959 #

> but you'll know it when you see it.

I agree, but with the caveat that it's getting harder and harder with all the hype / doom cycles and all the goalpost moving that's happening in this space.

IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.

replies(4): >>43746116 #>>43746460 #>>43746560 #>>43746705 #

bayarearefugee ◴[20 Apr 25 20:54 UTC] No.43746460[source]▶

>>43746058 #

There's no way to be sure in either case, but I suspect their impressions of the technology ten or twenty years ago would be not so different from my experience of first using LLMs a few years ago...

Which is to say complete amazement followed quickly by seeing all the many ways in which it absolutely falls flat on its face revealing the lack of actual thinking, which is a situation that hasn't fundamentally changed since then.

replies(1): >>43748573 #

HdS84 ◴[21 Apr 25 04:47 UTC] No.43748573[source]▶

>>43746460 #

Yes, thar is the same feelingg I have. Giving it some json and describe how a website should look? Super fast results and amazing capabilities. Trying to get it to translate my unit tests from Xunit to Tunit, where the latter is new and does not have a ton of blog posts? Forget about it. The process is purely mechanical and it is easy after RTFM, but it falls flat on its face

replies(1): >>43750395 #

Closi ◴[21 Apr 25 10:48 UTC] No.43750395[source]▶

>>43748573 #

Although I think if you asked people 20 years ago to describe a test for something AGI would do, they would be more likely to say “writing a poem” or “making art” than “turning Xunit code to Tunit”

IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.

LLMs to me are still a technology of pure science fiction come to life before our eyes!

replies(1): >>43750445 #

Jensson ◴[21 Apr 25 10:54 UTC] No.43750445[source]▶

>>43750395 #

Tell them humans need to babysit it and doublecheck its answers to do anything since it isn't as reliable as a human then no they wouldn't call it an AGI back then either.

The whole point about AGI is that it is general like a human, if it has such glaring weaknesses as the current AI has it isn't AGI, it was the same back then. That an AGI can write a poem doesn't mean being able to write a poem makes it an AGI, its just an example the AI couldn't do 20 years ago.

replies(1): >>43750479 #

1. Closi ◴[21 Apr 25 10:59 UTC] No.43750479[source]▶

>>43750445 #

Why do human programmers need code review then if they are intelligent?

And why can’t expert programmers deploy code without testing it? Surely they should just be able to write it perfectly first time without errors if they were actually intelligent.

replies(1): >>43750524 #

2. Jensson ◴[21 Apr 25 11:04 UTC] No.43750524[source]▶

>>43750479 (TP) #

> Why do human programmers need code review then if they are intelligent?

Human programmers don't need code reviews, they can test things themselves. Code reviews is just an optimization to scale up it isn't a requirement to make programs.

Also the AGI is allowed to let another AGI code review it, the point is there shouldn't be a human in the loop.

> And why can’t expert programmers deploy code without testing it?

Testing it can be done by themselves, the AGI model is allowed to test its own things as well.

replies(1): >>43750556 #

3. Closi ◴[21 Apr 25 11:08 UTC] No.43750556[source]▶

>>43750524 #

Well AGI can write unit tests, write application code then run the tests and iterate - agents in cursor are doing this already.

Just not for more complex applications.

Code review does often find bugs in code…

Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

I also think it’s quite a “programmer mentality” that most of the tests in this forum about if something is/isn’t AGI ultimately boils down to if it can write bug-free code, rather than if it can negotiate or sympathise or be humerous or write an engaging screen play… I’m not saying AGI is good at those things yet, but it’s interesting that we talk about the test of AGI being transpiling code rather than understanding philosophy.

replies(1): >>43750831 #

4. Jensson ◴[21 Apr 25 11:46 UTC] No.43750831{3}[source]▶

>>43750556 #

> Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

But the AI still can't replace you, it doesn't learn as it go and therefore fail to navigate long term tasks the way humans do. When a human writes a big program he learns how to write it as he writes it, these current AI cannot do that.

replies(1): >>43754367 #

5. int_19h ◴[21 Apr 25 17:31 UTC] No.43754367{4}[source]▶

>>43750831 #

Strictly speaking, it can, but its ability to do so is limited by its context size.

Which keeps growing - Gemini is at 2 million tokens now, which is several books worth of text.

Note also that context is roughly the equivalent of short-term memory in humans, while long-term memory is more like RAG.

↑