Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

265 points ctoth | 4 comments | 20 Apr 25 14:55 UTC | HN request time: 0s | source

Show context

mellosouls ◴[20 Apr 25 17:44 UTC] No.43745240[source]▶

The capabilities of AI post gpt3 have become extraordinary and clearly in many cases superhuman.

However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.

What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".

This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.

"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.

replies(13): >>43745398 #>>43745959 #>>43746159 #>>43746204 #>>43746319 #>>43746355 #>>43746427 #>>43746447 #>>43746522 #>>43746657 #>>43746801 #>>43749837 #>>43795216 #

jjeaff ◴[20 Apr 25 19:33 UTC] No.43745959[source]▶

>>43745240 #

I suspect AGI will be one of those things that you can't describe it exactly, but you'll know it when you see it.

replies(7): >>43746043 #>>43746058 #>>43746080 #>>43746093 #>>43746651 #>>43746728 #>>43746951 #

ninetyninenine ◴[20 Apr 25 19:45 UTC] No.43746043[source]▶

>>43745959 #

I suspect everyone will call it a stochastic parrot because it did this one thing not right. And this will continue into the far far future even when it becomes sentient we will completely miss it.

replies(2): >>43746150 #>>43746235 #

AstralStorm ◴[20 Apr 25 20:13 UTC] No.43746235[source]▶

>>43746043 #

It's more than that but less than intelligence.

Its generalization capabilities are a bit on the low side, and memory is relatively bad. But it is much more than just a parrot now, it can handle some of basic logic, but not follow given patterns correctly for novel problems.

I'd liken it to something like a bird, extremely good at specialized tasks but failing a lot of common ones unless repeatedly shown the solution. It's not a corvid or a parrot yet. Fails rather badly at detour tests.

It might be sentient already though. Someone needs to run a test if it can discern itself and another instance of itself in its own work.

replies(2): >>43746293 #>>43747706 #

1. Jensson ◴[20 Apr 25 20:25 UTC] No.43746293[source]▶

>>43746235 #

> It might be sentient already though. Someone needs to run a test if it can discern itself and another instance of itself in its own work.

It doesn't have any memory, how could it tell itself from a clone of itself?

replies(3): >>43746314 #>>43752404 #>>43752923 #

2. AstralStorm ◴[20 Apr 25 20:28 UTC] No.43746314[source]▶

>>43746293 (TP) #

Similarity match. For that you need to understand reflexively how you think and write.

It's a fun test to give a person something they have written but do not remember. Most people can still spot it.

It's easier with images though. Especially a mirror. For DallE, the test would be if it can discern its own work from human generated image. Especially of you give it an imaginative task like drawing a representation of itself.

3. butlike ◴[21 Apr 25 14:27 UTC] No.43752404[source]▶

>>43746293 (TP) #

It doesn't have any memory _you're aware of_. A semiconductor can hold state, so it has memory.

4. ben_w ◴[21 Apr 25 15:17 UTC] No.43752923[source]▶

>>43746293 (TP) #

People already share viral clips of AI recognising other AI, but I've not seen real scientific study of if this is due to a literary form of passing a mirror test, or if it's related to the way most models openly tell everyone they talk to that they're an AI.

As for "how", note that memory isn't one single thing even in humans: https://en.wikipedia.org/wiki/Memory

I don't want to say any of these are exactly equivalent to any given aspect of human memory, but I would suggest that LLMs behave kinda like they have:

(1) Sensory memory in the form of a context window — and in this sense are wildly superhuman because for a human that's about one second, whereas an AI's context window is about as much text as a human goes through in a week (actually less because we don't only read, other sensory modalities do matter; but for scale: equivalent to what you read in a week)

(2) Short term memory in the form of attention heads — and in this sense are wildly superhuman, because humans pay attention to only 4–5 items whereas DeepSeek v3 defaults to 128.

(3) The training and fine-tuning process itself that allows these models to learn how to communicate with us. Not sure what that would count as. Learned skill? Operant conditioning? Long term memory? It can clearly pick up different writing styles, because it can be made to controllably output in different styles — but that's an "in principle" answer. None of Claude 3.7, o4-mini, DeepSeek r1, could actually identify the authorship of a (n=1) test passage I asked 4o to generate for me.

↑