Most active commenters

Zambyte(5)
littlestymaar(4)

Popular/hot comments

>>43751875 #
>>43750593 #

←back to thread

Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

Show context

mellosouls ◴[20 Apr 25 17:44 UTC] No.43745240[source]▶

>>43744173 (OP) #

The capabilities of AI post gpt3 have become extraordinary and clearly in many cases superhuman.

However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.

What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".

This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.

"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.

replies(13): >>43745398 #>>43745959 #>>43746159 #>>43746204 #>>43746319 #>>43746355 #>>43746427 #>>43746447 #>>43746522 #>>43746657 #>>43746801 #>>43749837 #>>43795216 #

Zambyte ◴[20 Apr 25 20:09 UTC] No.43746204[source]▶

>>43745240 #

I think a reasonable definition of intelligence is the application of reason on knowledge. An example of a system that is highly knowledgeable but has little to no reason would be an encyclopedia. An example of a system that is highly reasonable, but has little knowledge would be a calculator. Intelligent systems demonstrate both.

Systems that have general intelligence are ones that are capable of applying reason to an unbounded domain of knowledge. Examples of such systems include: libraries, wikis, and forums like HN. These systems are not AGI, because the reasoning agents in each of these systems are organic (humans); they are more like a cyborg general intelligence.

Artificial general intelligence are just systems that are fully artificial (ie: computer programs) that can apply reason to an unbounded domain of knowledge. We're here, and we have been for years. AGI sets no minimum as to how great the reasoning must be, but it's obvious to anyone who has used modern generative intelligence systems like LLMs that the technology can be used to reason about an unbounded domain of knowledge.

If you don't want to take my word for it, maybe Peter Norvig can be more convincing: https://www.noemamag.com/artificial-general-intelligence-is-...

replies(3): >>43746635 #>>43746665 #>>43749304 #

jimbokun ◴[20 Apr 25 21:24 UTC] No.43746635[source]▶

>>43746204 #

Excellent article and analysis. Surprised I missed it.

It is very hard to argue with Norvig’s arguments that AGI has been around since at least 2023.

replies(1): >>43749356 #

littlestymaar ◴[21 Apr 25 07:49 UTC] No.43749356[source]▶

>>43746635 #

It's not: whatever the way you define AGI, you cannot just ignore the key letter of the three letters acronym: G stands for “General”.

You can argue that for the first time in the history we have an AI that deserves its name (unlike Deep blue or AlphaGo which aren't really about intelligence at all) but you cannot call that Artificial GENERAL Intelligence before it overcomes the “jagged intelligence” syndrome.

replies(1): >>43749719 #

1. Zambyte ◴[21 Apr 25 08:54 UTC] No.43749719[source]▶

>>43749356 #

It sounds like you have a different definition of "general" in the context of intelligence from the one I shared. What is it?

replies(1): >>43750472 #

2. Jensson ◴[21 Apr 25 10:58 UTC] No.43750472[source]▶

>>43749719 (TP) #

General intelligence means it can do the same intellectual tasks as humans can, including learning to do different kinds of intellectual jobs. Current AI can't learn to do most jobs like a human kid can, so its not AGI.

This is the original definition of AGI. Some data scientists try to move the goalposts to something else and call something that can't replace humans "AGI".

This is a very simple definition that is easy to see when it is fulfilled because then companies can operate without humans.

replies(1): >>43750593 #

3. Zambyte ◴[21 Apr 25 11:14 UTC] No.43750593[source]▶

>>43750472 #

What intellectual tasks can humans do that language models can't? Particularly agentic language model frameworks.

replies(3): >>43750658 #>>43752538 #>>43755623 #

4. Jensson ◴[21 Apr 25 11:23 UTC] No.43750658{3}[source]▶

>>43750593 #

A normal software engineering job? You have access to email and can send code etc. No current model manages anything close to that. Even much simpler jobs can't be automated like that by them.

So basically any form of longer term tasks cannot be done by them currently. Short term tasks with constant supervision is about the only things they can do, and that is very limited, most tasks are long term tasks.

replies(1): >>43751875 #

5. Zambyte ◴[21 Apr 25 13:36 UTC] No.43751875{4}[source]▶

>>43750658 #

> You have access to email and can send code etc. No current model manages anything close to that.

This is an issue of tooling, not intelligence. Language models absolutely have the power to process email and send (push?) code, should you give them the tooling to do so (also true of human intelligence).

> So basically any form of longer term tasks cannot be done by them currently. Short term tasks with constant supervision is about the only things they can do, and that is very limited, most tasks are long term tasks.

Are humans that have limited memory due to a condition not capable of general intelligence, xor does intelligence exist on a spectrum? Also, long term tasks can be decomposed into short term tasks. Perhaps automatically, by a language model.

Have you actually tried agentic LLM based frameworks that use tool calling for long term memory storage and retrieval, or have you decided that because these tools do not behave perfectly in a fluid environment where humans do not behave perfectly either, that it's "impossible"?

replies(4): >>43752419 #>>43752534 #>>43755668 #>>43755820 #

6. elevatortrim ◴[21 Apr 25 14:28 UTC] No.43752419{5}[source]▶

>>43751875 #

Why would it be a tooling issue? AI has access to email, IDEs, and all kinds of systems. It still cannot go and build software on its own by speaking to stakeholders, taking instructions from a PM, understanding it needs to speak to DevOps to release its code, suggesting to product team that feature is better developed as part of core product, objecting to SA about the architecture, and on and on…

(If it was a tooling issue, AGI could build the missing tools)

7. raducu ◴[21 Apr 25 14:39 UTC] No.43752534{5}[source]▶

>>43751875 #

> Have you actually tried agentic LLM based frameworks that use tool calling for long term memory storage and retrieval, or have you decided that because these tools do not behave perfectly in a fluid environment where humans do not behave perfectly either, that it's "impossible"?

i.e. "Have you tried this vague, unnamed thing that I alude to that seems to be the answer that contradicts your point, but actually doesn't?"

AGI = 90% of software devs, psychotherapists, lawyers, teachers lose their jobs, we are not there.

Once LLMs can fork themselves, reflect and accumulate domain specific knowledge and transfer the whole context back to the model weights, once that knowledge can become more important than the pre-pretrained information, once they can form new neurons related to a project topic, then yes, we will have AGI (probably not that far away). Once LLM's can keep trying to find a bug for days and weeks and months, go through the debugger, ask people relevant questions, deploy code with new debugging traces, deploy mitigations and so on, we will have AGI.

Otherwise, AI is stuck in this groundhog day type scenario, where it's forever the brightest intern that any company has ever seen, but he's forever stuck at day 0 on the job, forever not that usefull, but full of potential.

8. ben_w ◴[21 Apr 25 14:40 UTC] No.43752538{3}[source]▶

>>43750593 #

Weird spiky things that are hard to characterise even within one specific model, and where the ability to reliably identify such things itself causes subsequent models to not fail so much.

A few months ago, I'd have said "create image with coherent text"*, but that's now changed. At least in English — trying to get ChatGPT's new image mode to draw the 狐 symbol sometimes works, sometimes goes weird in the way latin characters used to.

* if the ability to generate images doesn't count as "language model" then one intellectual task they can't do is "draw images", see Simon Willison's pelican challenge: https://simonwillison.net/tags/pelican-riding-a-bicycle/

9. littlestymaar ◴[21 Apr 25 19:40 UTC] No.43755623{3}[source]▶

>>43750593 #

Read a bunch of books not present in the training data on a specific topic, and learn something from it.

You can cheat with tooling like RAG or agentic frameworks, but the result isn't going to be good and it's not the AI that learns.

But besides this fundamental limitation, had you tried implementing production ready stuff with LLM, you'd have discovered that language models are still painfully unreliable even for the tasks they are supposed to be good at: they will still hallucinate when summarizing, fail to adhere to the prompt, add paragraphs in English at random when working in French, edit unrelated parts of the code you ask it to edit, etc, etc.

You can work around many of those once you've identified it, but that still counts as a fail in a response to your question.

10. littlestymaar ◴[21 Apr 25 19:43 UTC] No.43755668{5}[source]▶

>>43751875 #

> Have you actually tried agentic LLM based frameworks that use tool calling for long term memory storage and retrieval,

You can work around the limitations of LLMs' intelligence with your own and an external workflow you design, but I don't see how that counts as part of the LLM's intelligence.

replies(1): >>43757253 #

11. kweingar ◴[21 Apr 25 19:58 UTC] No.43755820{5}[source]▶

>>43751875 #

> This is an issue of tooling, not intelligence. Language models absolutely have the power to process email and send (push?) code, should you give them the tooling to do so (also true of human intelligence).

At a certain point, a tooling issue becomes an intelligence issue. AGI would be able to build the tools they need to succeed.

If we have millions of these things deployed, they can work 24/7, and they supposedly have human-level intelligence, then why haven't they been able to bootstrap their own tooling yet?

12. Zambyte ◴[21 Apr 25 22:34 UTC] No.43757253{6}[source]▶

>>43755668 #

Humans have general intelligence. A network of humans have better general intelligence.

LLMs have general intelligence. A network of LLMs have better general intelligence.

If a single language model isn't intelligent enough for a task, but a human is, there is a good chance there exists a sufficient network of language models that is intelligent enough.

replies(1): >>43759465 #

13. littlestymaar ◴[22 Apr 25 06:05 UTC] No.43759465{7}[source]▶

>>43757253 #

> LLMs have general intelligence.

No they don't. That's the key part you keep assuming without justification. Interestingly enough you haven't responded to my other comment [1].

You asked “What intellectual tasks can humans do that language models can't?” and now that I'm thinking about it again, I think the more apt question would be the reverse:

“What intellectual tasks can a LLM do autonomously without any human supervision (direct or indirect[2]) if there's money at stake?”

You'll see that the list is going to be very shallow if not empty.

> A network of LLMs have better general intelligence.

Your argument was about tool calling for long term memory, this isn't “a network of LLM” but an LLM another tool chosen by a human to deal with LLM's limitations one one particular problem (and of you need long term memory for another problem you're very likely to need to rework both your prompt and your choice of tools to address it: it's not the LLM that solves it but your own intelligence).

[1]: https://news.ycombinator.com/item?id=43755623 [2] indirect supervision would be the human designing an automatic verification system to check LLMs output before using it. Any kind of verification that is planned in advance by the human and not improvised by the LLM when facing the problem counts as indirect supervision, even if it relies on another LLM.

↑