Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

265 points ctoth | 2 comments | 20 Apr 25 14:55 UTC | HN request time: 0.419s | source

Show context

simonw ◴[20 Apr 25 17:27 UTC] No.43745125[source]▶

Coining "Jagged AGI" to work around the fact that nobody agrees on a definition for AGI is a clever piece of writing:

> In some tasks, AI is unreliable. In others, it is superhuman. You could, of course, say the same thing about calculators, but it is also clear that AI is different. It is already demonstrating general capabilities and performing a wide range of intellectual tasks, including those that it is not specifically trained on. Does that mean that o3 and Gemini 2.5 are AGI? Given the definitional problems, I really don’t know, but I do think they can be credibly seen as a form of “Jagged AGI” - superhuman in enough areas to result in real changes to how we work and live, but also unreliable enough that human expertise is often needed to figure out where AI works and where it doesn’t.

replies(4): >>43745268 #>>43745321 #>>43745426 #>>43746223 #

qsort ◴[20 Apr 25 18:14 UTC] No.43745426[source]▶

>>43745125 #

I don't think that's a particularly honest line of thinking though. It preempts the obvious counterargument, but very weakly so. Calculators are different, but why? Can an ensemble of a calculator, a Prolog interpreter, Alexnet and Stockfish be considered "jagged superintelligence"? They are all clearly superhuman, and yet require human experience to be wielded effectively.

I'm guilty as charged of having looked at GPT 3.5 and having thought "it's meh", but more than anything this is showing that debating words rather than the underlying capabilities is an empty discussion.

replies(1): >>43745570 #

og_kalu ◴[20 Apr 25 18:34 UTC] No.43745570[source]▶

>>43745426 #

>Calculators are different, but why? Can an ensemble of a calculator, a Prolog interpreter, Alexnet and Stockfish be considered "jagged superintelligence"?

Those are all different things with little to nothing to do with each other. It's like saying what if I ensemble a snake and cat ? What does that even mean ? GPT-N or whatever is a single model that can do many things, no ensembling required. That's the difference between it and a calculator or stockfish.

replies(1): >>43746302 #

1. AstralStorm ◴[20 Apr 25 20:26 UTC] No.43746302[source]▶

>>43745570 #

That is not true, the model is modular, thus an ensemble. Uses DallE for graphics and specialized tokenizer models for sound.

If you remove those tools, or cut its access to search databases, it becomes quite less capable.

A human would often still manage to do it without some data still, perhaps with less certainty, while GPT has more problems than that without others filling in the holes.

replies(1): >>43746557 #

2. og_kalu ◴[20 Apr 25 21:10 UTC] No.43746557[source]▶

>>43746302 (TP) #

>Uses DallE for graphics and specialized tokenizer models for sound.

chatgpt no longer uses dalle for image generation. I don't understand your point about the tokenization. It doesn't make the model become an ensemble.

It's also just beside the point. Even if you restrict the modalities to text alone, these models are still general alone in ways a calculator is not.

↑