←back to thread

265 points ctoth | 2 comments | | HN request time: 0.414s | source
Show context
simonw ◴[] No.43745125[source]
Coining "Jagged AGI" to work around the fact that nobody agrees on a definition for AGI is a clever piece of writing:

> In some tasks, AI is unreliable. In others, it is superhuman. You could, of course, say the same thing about calculators, but it is also clear that AI is different. It is already demonstrating general capabilities and performing a wide range of intellectual tasks, including those that it is not specifically trained on. Does that mean that o3 and Gemini 2.5 are AGI? Given the definitional problems, I really don’t know, but I do think they can be credibly seen as a form of “Jagged AGI” - superhuman in enough areas to result in real changes to how we work and live, but also unreliable enough that human expertise is often needed to figure out where AI works and where it doesn’t.

replies(4): >>43745268 #>>43745321 #>>43745426 #>>43746223 #
shrx ◴[] No.43745268[source]
>> It is already demonstrating general capabilities and performing a wide range of intellectual tasks, including those that it is not specifically trained on.

Huh? Isn't a LLM's capability fully constrained by the training data? Everything else is hallucinated.

replies(2): >>43745341 #>>43745489 #
simonw ◴[] No.43745489[source]
You can argue that everything output by an LLM is hallucinated, since there's no difference under-the-hood between outputting useful information and outputting hallucinations.

The quality of the LLM then becomes how often it produces useful information. That score has gone up a lot in the past 18 months.

(Sometimes hallucinations are what you want: "Tell me a fun story about a dog learning calculus" is a valid prompt which mostly isn't meant to produce real facts about the world")

replies(1): >>43745752 #
1. codr7 ◴[] No.43745752[source]
Isn't it the case that the latest models actually hallucinate more than the ones that came before? Despite best efforts to prevent it.
replies(1): >>43746158 #
2. simonw ◴[] No.43746158[source]
The o3 model card reports a so far unexplained uptick in hallucination rate from o1 - on page 4 of https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...

That is according to one specific internal OpenAI benchmark, I don't know if it's been replicated externally yet.