Most active commenters

daxfohl(4)
esperent(3)
ctoth(3)

Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

Show context

mellosouls ◴[20 Apr 25 17:44 UTC] No.43745240[source]▶

The capabilities of AI post gpt3 have become extraordinary and clearly in many cases superhuman.

However (as the article admits) there is still no general agreement of what AGI is, or how we (or even if we can) get there from here.

What there is is a growing and often naïve excitement that anticipates it as coming into view, and unfortunately that will be accompanied by the hype-merchants desperate to be first to "call it".

This article seems reasonable in some ways but unfortunately falls into the latter category with its title and sloganeering.

"AGI" in the title of any article should be seen as a cautionary flag. On HN - if anywhere - we need to be on the alert for this.

replies(13): >>43745398 #>>43745959 #>>43746159 #>>43746204 #>>43746319 #>>43746355 #>>43746427 #>>43746447 #>>43746522 #>>43746657 #>>43746801 #>>43749837 #>>43795216 #

daxfohl ◴[20 Apr 25 21:30 UTC] No.43746657[source]▶

>>43745240 #

Until you can boot one up, give it access to a VM video and audio feeds and keyboard and mouse interfaces, give it an email and chat account, tell it where the company onboarding docs are and expect them to be a productive team member, they're not AGI. So long as we need special protocols like MCP and A2A, rather than expecting them to figure out how to collaborate like a human, they're not AGI.

The first step, my guess, is going to be the ability to work through github issues like a human, identifying which issues have high value, asking clarifying questions, proposing reasonable alternatives, knowing when to open a PR, responding to code review, merging or abandoning when appropriate. But we're not even very close to that yet. There's some of it, but from what I've seen most instances where this has been successful are low level things like removing old feature flags.

replies(3): >>43746758 #>>43747095 #>>43747467 #

1. rafaelmn ◴[20 Apr 25 21:50 UTC] No.43746758[source]▶

>>43746657 #

Just because we rely on vision to interface with computer software doesn't mean it's optimal for AI models. Having a specialized interface protocol is orthogonal to capability. Just like you could theoretically write code in a proportional font with notepad and run your tools through windows CMD - having an editor with syntax highlighting and monospaced font helps you read/navigate/edit, having tools/navigation/autocomplete etc. optimized for your flow makes you more productive and expands your capability, etc.

If I forced you to use unnatural interfaces it would severely limit your capabilities as well because you'd have to dedicate more effort towards handling basic editing tasks. As someone who recently swapped to a split 36key keyboard with a new layout I can say this becomes immediately obvious when you try something like this. You take your typing/editing skills for granted - try switching your setup and see how your productivity/problem solving ability tanks in practice.

replies(3): >>43747058 #>>43747819 #>>43752611 #

2. daxfohl ◴[20 Apr 25 22:41 UTC] No.43747058[source]▶

>>43746758 (TP) #

Agreed, but I also think to be called AGI, they should be capable of working through human interfaces rather than needing to have special interfaces created for them to get around their lack of AGI.

The catch in this though isn't the ability to use these interfaces. I expect that will be easy. The hard part will be, once these interfaces are learned, the scope and search space of what they will be able to do is infinitely larger. And moreover our expectations will change in how we expect an AGI to handle itself when our way of working with it becomes more human.

Right now we're claiming nascent AGI, but really much of what we're asking these systems to do have been laid out for them. A limited set of protocols and interfaces, and a targeted set of tasks to which we normally apply these things. And moreover our expectations are as such. We don't converse with them as with a human. Their search space is much smaller. So while they appear AGI in specific tasks, I think it's because we're subconsciously grading them on a curve. The only way we have to interact with them prejudices us to have a very low bar.

That said, I agree that video feed and mouse is a terrible protocol for AI. But that said, I wouldn't be surprised if that's what we end up settling on. Long term, it's just going to be easier for these bots to learn and adapt to use human interfaces than for us to maintain two sets of interfaces for things, except for specific bot-to-bot cases. It's horribly inefficient, but in my experience efficiency never comes out ahead with each new generation of UIs.

3. esperent ◴[21 Apr 25 01:28 UTC] No.43747819[source]▶

>>43746758 (TP) #

> Just because we rely on vision to interface with computer software doesn't mean it's optimal for AI models

This is true but AGI means "Artificial General Intelligence". Perhaps it would be even more efficient with certain interfaces, but to be general it would have to at least work with the same ones as humans.

Here's some things that I think a true AGI would need to be able to do:

* Control a general purpose robot and use vision to do housework, gardening etc.

* Be able to drive a car - equivalent interfaces to humans might be service motor controlled inputs.

* Use standard computer inputs to do standard computer tasks

And this list could easily be extended.

If we have to be very specific in the choice of interfaces and tasks that we give it, it's not a general AI.

At the same time, we have to be careful at moving the goalposts too much. But current AI are limited to what can be returned in a small number of interfaces (prompt with text/image/video & return text/image/video data). This is amazing, they can sound very intelligent while doing so. But it's important not to lose sight of what they still can't do well which is basically everything else.

Outside of this area, when you do hear of an AI doing something well (self driving, for example) it's usually a separate specialized model rather than a contribution towards AGI.

replies(2): >>43747924 #>>43753643 #

4. mNovak ◴[21 Apr 25 01:59 UTC] No.43747924[source]▶

>>43747819 #

By this logic disabled people would not class as "Generally Intelligent" because they might have physical "interface" limitations.

Similarly I wouldn't be "Generally Intelligent" by this definition if you sat me at a Cyrillic or Chinese keyboard. For this reason, I see human-centric interface arguments as a red herring.

I think a better candidate definition might be about learning and adapting to new environments (learning from mistakes and predicting outcomes), assuming reasonable interface aids.

replies(2): >>43748508 #>>43748532 #

5. esperent ◴[21 Apr 25 04:30 UTC] No.43748508{3}[source]▶

>>43747924 #

> Similarly I wouldn't be "Generally Intelligent" by this definition if you sat me at a Cyrillic or Chinese keyboard

Would you be able to be taught to use those keyboards? Then you're generally intelligent. If you could not learn, then maybe you're not generally intelligent?

Regarding disabled people, this is an interesting point. Assuming that we're talking about physical disabilities only, disabled people are capable of learning how to use any standard human inputs. It's just the physical controls that are problematic.

For an AI, the physical input is not the problem. We can just put servo motors on the car controls (steering wheel, brakes, gas) and give it a camera feed from the car. Given those inputs, can the AI learn to control the car as a generally intelligent person could, given the ability to use the same controls?

6. vczf ◴[21 Apr 25 04:35 UTC] No.43748532{3}[source]▶

>>43747924 #

If all we needed was general intelligence, we would be hiring octopuses. Human skills, like fluency in specific languages, are implicit in our concept of AGI.

replies(1): >>43750281 #

7. ◴[21 Apr 25 10:30 UTC] No.43750281{4}[source]▶

>>43748532 #

8. raducu ◴[21 Apr 25 14:47 UTC] No.43752611[source]▶

>>43746758 (TP) #

> Just because we rely on vision to interface with computer software doesn't mean it's optimal for AI models.

It's optimal for beings that have general purpose inteligence.

> would severely limit your capabilities as well because you'd have to dedicate more effort towards handling basic editing tasks

Yes, but humans will eventually get used to it and internalize the keyboard, the domain language, idioms and so on and their context gets pushed to long term knowledge overnight and thei short term context gets cleaned up and they get bettet and better at the job, day by day. AI starts very strong but stays at that level forever.

When faced with a really hard problem, day after day the human will remember what he tried yesterday and parts of that problem will become easier and easier for the human, not so for the AI, if it can't solve a problem today, running it for days and days produces diminishing returns.

That's the General part of human intelligence -- over time it can aquire new skills it did not have yesterday, LLMs can't do that -- there is no byproduct of them getting better/aquiring new skills as a result of their practicing a problem.

replies(2): >>43753302 #>>43753617 #

9. daxfohl ◴[21 Apr 25 15:49 UTC] No.43753302[source]▶

>>43752611 #

Right, and also the ability to know when it's stuck. It should be able to take a problem, work on it for a few hours, and if it decides it's not making progress it should be able to ping back asynchronously, "Hey I've broken the problem down into A, B, C, and D, and I finished A and B, but C seems like it's going to take a while and I wanted to make sure this is the right approach. Do you have time to chat?" Or similarly, I should be able to ask for a status update and get this answer back.

10. ctoth ◴[21 Apr 25 16:14 UTC] No.43753617[source]▶

>>43752611 #

> It's optimal for beings that have general purpose inteligence [Sic].

Hi. I'm blind. I would like to think I have general-purpose intelligence thanks.

And I can state that interfacing with vision would, in fact, be suboptimal for me. The visual cortex is literally unformed. Yet somehow I can perform symbolic manipulations. Converse with people. Write code. Get frustrated with strangers on the Internet. Perhaps there are other "optimal" ways that "intelligent" systems can use to interface with computers? I don't know, maybe the accessibility APIs we have built? Maybe MCP? Maybe any number of things? Data structures specifically optimized for the purpose and exchanged directly between vastly-more-complex intelligences than ourselves? Do you really think that clicking buttons through a GUI is the one true optimal way to use a computer?

replies(2): >>43753763 #>>43754257 #

11. ctoth ◴[21 Apr 25 16:16 UTC] No.43753643[source]▶

>>43747819 #

So I am a blind human. I cannot drive a car or use a camera/robot to do housework (I need my hands to see!) Am I not a general intelligence?

replies(1): >>43757739 #

12. Jensson ◴[21 Apr 25 16:28 UTC] No.43753763{3}[source]▶

>>43753617 #

> Do you really think that clicking buttons through a GUI is the one true optimal way to use a computer?

There are some tasks you can't do without vision, but I agree it is dumb to say general intelligence requires vision, vision is just an information source it isn't about intelligence. Blind people can be excellent software engineers etc they can do most white collar work just as well as anyone else since most tasks doesn't require visual processing, text processing works well enough.

replies(1): >>43754062 #

13. jermaustin1 ◴[21 Apr 25 17:01 UTC] No.43754062{4}[source]▶

>>43753763 #

> There are some tasks you can't do without vision...

I can't think of anything where you require vision that having a tool (a sighted person) you protocol with (speak) wouldn't suffice. So why aren't we giving AI the same "benefit" of using any tool/protocol it needs to complete something.

replies(1): >>43754227 #

14. ctoth ◴[21 Apr 25 17:17 UTC] No.43754227{5}[source]▶

>>43754062 #

> I can't think of anything where you require vision that having a tool (a sighted person) you protocol with (speak) wouldn't suffice.

Okay, are you volunteering to be the guide passenger while I drive?

replies(1): >>43755352 #

15. daxfohl ◴[21 Apr 25 17:20 UTC] No.43754257{3}[source]▶

>>43753617 #

Of course not. The visual part is window dressing on the argument. The real point is, before declaring AGI, I think the way we interact with these agents needs to be more like human to human interaction. Right now, agents generally accept a command, figure out which from a small number of MCPs that have been precoded for it to use, do that thing you wanted right or wrong, the end. If it does the right thing, huge confirmation bias that it's AGI. Maybe the MCP did most of the real work. If it doesn't, well, blame the prompt or maybe blame the MCPs are lacking good descriptions or something.

To get a solid read on AGI, we need to be grading them in comparison to a remote coworker. That they necessarily see a GUI is not required. But what is required is that they have access to all the things a human would, and don't require any special tools that limit their search space to a level below what a human coworker would have. If it's possible for a human coworker to do their whole job via console access, sure, that's fine too. I only say GUI because I think it'd actually be the easiest option, and fairly straightforward for these agents. Image processing is largely solved, whereas figuring out how to do everything your job requires via console is likely a mess.

And like I said, "using the computer", whether via GUI or screen reader or whatever else, isn't going to be the hard part. The hard part is, now that they have this very abstract capability and astronomically larger search space, it changes the way we interact with them. We send them email. We ping them on Slack. We don't build special baby mittens MCPs and such for them and they have to enter the human world and prove that they can handle it as a human would. Then I would say we're getting closer to AGI. But as long as we're building special tools and limiting their search space to that limited scope, to me it feels like we're still a long way off.

16. jermaustin1 ◴[21 Apr 25 19:08 UTC] No.43755352{6}[source]▶

>>43754227 #

Thank you for making my point:

We have created a tool called "full self driving" cars already. This is a tool that humans can use, just like we have MCPs a tool for AI to use.

All I'm trying to say, is AGIs should be allowed to use tools that fit their intelligence the same way that we do. I'm not saying AIs are AGIs, I'm just saying that the requirement that they use a mouse and keyboard is a very weird requirement like saying People who can't use a mouse and keyboard (amputees, etc.) aren't "Generally" intelligent. Or people who can't see the computer screen.

17. esperent ◴[21 Apr 25 23:55 UTC] No.43757739{3}[source]▶

>>43753643 #

I replied this to another comment, but I'll put it here: your limitation is physical. You have standard human intelligence, but you're lacking a certain physical input (vision). As a generally intelligent being, you will compensate for the lack of vision by using other senses.

That's different to AIs, which we can hook up to all kinds of inputs: cameras, radar, lidar, car controls, etc. For the AI the lack of input is not the limitation. It's whether they can do anything with an arbitrary input/control, like a servo motor controlling a steering wheel, for example.

To look at it another way, if an AI can operate a robot body by vision, then we suddenly removed the vision input and replaced it with a sense of touch and hearing, would the AI be able to compensate? If it's an AGI, then it should be able to. A human can.

On the other hand, I wonder if we humans are really as "generally intelligent" as we like to think. Humans struggle to learn new languages as adults, for example (something I can personally attest to, having moved to Asia as an adult). So, really, are human beings a good standard by which to judge an AI as AGI?

↑