Most active commenters

Vegenoid(5)
benlivengood(3)
boznz(3)
EMIRELADERO(3)
dangus(3)
fennecfoxy(3)

Popular/hot comments

>>43586541 #
>>43593351 #

←back to thread

AI 2027

(ai-2027.com)

1. Vegenoid ◴[04 Apr 25 17:20 UTC] No.43585338[source]▶

>>43571851 (OP) #

I think we've actually had capable AIs for long enough now to see that this kind of exponential advance to AGI in 2 years is extremely unlikely. The AI we have today isn't radically different from the AI we had in 2023. They are much better at the thing they are good at, and there are some new capabilities that are big, but they are still fundamentally next-token predictors. They still fail at larger scope longer term tasks in mostly the same way, and they are still much worse at learning from small amounts of data than humans. Despite their ability to write decent code, we haven't seen the signs of a runaway singularity as some thought was likely.

I see people saying that these kinds of things are happening behind closed doors, but I haven't seen any convincing evidence of it, and there is enormous propensity for AI speculation to run rampant.

replies(9): >>43585429 #>>43585830 #>>43586381 #>>43586613 #>>43586998 #>>43587074 #>>43594397 #>>43619183 #>>43709628 #

2. byearthithatius ◴[04 Apr 25 17:27 UTC] No.43585429[source]▶

>>43585338 (TP) #

Disagree. We know it _can_ learn out of distribution capabilities based on similarities to other distributions. Like the TikZ Unicorn[1] (which was not in training data anywhere) or my code (which has variable names and methods/ideas probably not seen 1:1 in training).

IMO this out of distribution learning is all we need to scale to AGI. Sure there are still issues, it doesn't always know which distribution to pick from. Neither do we, hence car crashes.

[1]: https://arxiv.org/pdf/2303.12712 or on YT https://www.youtube.com/watch?v=qbIk7-JPB2c

3. benlivengood ◴[04 Apr 25 18:02 UTC] No.43585830[source]▶

>>43585338 (TP) #

METR [0] explicitly measures the progress on long term tasks; it's as steep a sigmoid as the other progress at the moment with no inflection yet.

As others have pointed out in other threads RLHF has progressed beyond next-token prediction and modern models are modeling concepts [1].

[0] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

[1] https://www.anthropic.com/news/tracing-thoughts-language-mod...

replies(2): >>43585918 #>>43586196 #

4. Fraterkes ◴[04 Apr 25 18:11 UTC] No.43585918[source]▶

>>43585830 #

The METR graph proposes a 6 year trend, based largely on 4 datapoints before 2024. I get that it is hard to do analyses since were in uncharted territory, and I personally find a lot of the AI stuff impressive, but this just doesn't strike me as great statistics.

replies(1): >>43586603 #

5. Vegenoid ◴[04 Apr 25 18:37 UTC] No.43586196[source]▶

>>43585830 #

At the risk of coming off like a dolt and being super incorrect: I don't put much stock into these metrics when it comes to predicting AGI. Even if the trend of "length of task an AI can reliably do doubles every 7 months" continues, as they say that means we're years away from AI that can complete tasks that take humans weeks or months. I'm skeptical that the doubling trend will continue into that timescale, I think there is a qualitative difference between tasks that take weeks or months and tasks that take minutes or hours, a difference that is not reflected by simple quantity. I think many people responsible for hiring engineers are keenly aware of this distinction, because of their experience attempting to choose good engineers based on how they perform in task-driven technical interviews that last only hours.

Intelligence as humans have it seems like a "know it when you see it" thing to me, and metrics that attempt to define and compare it will always be looking at only a narrow slice of the whole picture. To put it simply, the gut feeling I get based on my interactions with current AI, and how it is has developed over the past couple of years, is that AI is missing key elements of general intelligence at its core. While there's more lots more room for its current approaches to get better, I think there will be something different needed for AGI.

I'm not an expert, just a human.

replies(2): >>43586698 #>>43586818 #

6. jug ◴[04 Apr 25 18:54 UTC] No.43586381[source]▶

>>43585338 (TP) #

> there are some new capabilities that are big, but they are still fundamentally next-token predictors

Anthropic recently released research where they saw how when Claude attempted to compose poetry, it didn't simply predict token by token and "react" to when it thought it might need a rhyme and then looked at its context to think of something appropriate, but actually saw several tokens ahead and adjusted for where it'd likely end up, ahead of time.

Anthropic also says this adds to evidence seen elsewhere that language models seem to sometimes "plan ahead".

Please check out the section "Planning in poems" here; it's pretty interesting!

https://transformer-circuits.pub/2025/attribution-graphs/bio...

replies(2): >>43586541 #>>43592440 #

7. percentcer ◴[04 Apr 25 19:11 UTC] No.43586541[source]▶

>>43586381 #

Isn't this just a form of next token prediction? i.e. you'll keep your options open for a potential rhyme if you select words that have many associated rhyming pairs, and you'll further keep your options open if you focus on broad topics over niche

replies(6): >>43586729 #>>43587041 #>>43588233 #>>43591952 #>>43592212 #>>43620308 #

8. benlivengood ◴[04 Apr 25 19:18 UTC] No.43586603{3}[source]▶

>>43585918 #

I agree that we don't have any good statistical models for this. If AI development were that predictable we'd likely already be past a singularity of some sort or in a very long winter just by reverse-engineering what makes the statistical model tick.

9. ComplexSystems ◴[04 Apr 25 19:19 UTC] No.43586613[source]▶

>>43585338 (TP) #

> They are much better at the thing they are good at, and there are some new capabilities that are big, but they are still fundamentally next-token predictors.

I don't really get this. Are you saying autoregressive LLMs won't qualify as AGI, by definition? What about diffusion models, like Mercury? Does it really matter how inference is done if the result is the same?

replies(1): >>43586803 #

10. benlivengood ◴[04 Apr 25 19:27 UTC] No.43586698{3}[source]▶

>>43586196 #

> I think there is a qualitative difference between tasks that take weeks or months and tasks that take minutes or hours, a difference that is not reflected by simple quantity.

I'd label that difference as long-term planning plus executive function, and wherever that overlaps with or includes delegation.

Most long-term projects are not done by a single human and so delegation almost always plays a big part. To delegate, tasks must be broken down in useful ways. To break down tasks a holistic model of the goal is needed where compartmentalization of components can be identified.

I think a lot of those individual elements are within reach of current model architectures but they are likely out of distribution. How many gantt charts and project plans and project manager meetings are in the pretraining datasets? My guess is few; rarely published internal artifacts. Books and articles touch on the concepts but I think the models learn best from the raw data; they can probably tell you very well all of the steps of good project management because the descriptions are all over the place. The actual doing of it is farther toward the tail of the distribution.

11. throwuxiytayq ◴[04 Apr 25 19:30 UTC] No.43586729{3}[source]▶

>>43586541 #

In the same way that human brains are just predicting the next muscle contraction.

replies(2): >>43586923 #>>43620367 #

12. Vegenoid ◴[04 Apr 25 19:36 UTC] No.43586803[source]▶

>>43586613 #

> Are you saying autoregressive LLMs won't qualify as AGI, by definition?

No, I am speculating that they will not reach capabilities that qualify them as AGI.

replies(1): >>43609690 #

13. Enginerrrd ◴[04 Apr 25 19:37 UTC] No.43586818{3}[source]▶

>>43586196 #

There is definitely something qualitatively different about weeks/months long tasks.

It reminds me of the difference between a fresh college graduate and an engineer with 10 years of experience. There are many really smart and talented college graduates.

But, while I am struggling to articulate exactly why, I know that when I was a fresh graduate, despite my talent and ambition, I would have failed miserably at delivering some of the projects that I now routinely deliver over time periods of ~1.5 years.

I think LLM's are really good at emulating the types of things I might say are the types of things that would make someone successful at this if I were to write it down in a couple paragraphs, or an article, or maybe even a book.

But... knowing those things as written by others just would not quite cut it. Learning at those time scales is just very different than what we're good at training LLM's to do.

A college graduate is in many ways infinitely more capable than a LLM. Yet there are a great many tasks that you just can't give an intern if you want them to be successful.

There are at least half a dozen different 1000-page manuals that one must reference to do a bare bones approach at my job. And there are dozens of different constituents, and many thousands of design parameters I must adhere to. Fundamentally, all of these things often are in conflict and it is my job to sort out the conflicts and come up with the best compromise. It's... really hard to do. Knowing what to bend so that other requirements may be kept rock solid, who to negotiate with for different compromises needed, which fights to fight, and what a "good" design looks like between alternatives that all seem to mostly meet the requirements. Its a very complicated chess game where it's hopelessly impossible to brute force but you must see the patterns along the way that will point you like sign posts into a good position in the end game.

The way we currently train LLM's will not get us there.

Until an LLM can take things in it's context window, assess them for importance, dismiss what doesn't work or turns out to be wrong, completely dismiss everything it knows when the right new paradigm comes up, and then permanently alter its decision making by incorporating all of that information in an intelligent way, it just won't be a replacment for a human being.

14. alfalfasprout ◴[04 Apr 25 19:46 UTC] No.43586923{4}[source]▶

>>43586729 #

Except that's not how it works...

replies(2): >>43587523 #>>43595541 #

15. uejfiweun ◴[04 Apr 25 19:54 UTC] No.43586998[source]▶

>>43585338 (TP) #

Isn't the brain kind of just a predictor as well, just a more complicated one? Instead of predicting and emitting tokens, we're predicting future outcomes and emitting muscle movements. Which is obviously different in a sense but I don't think you can write off the entire paradigm as a dead end just because the medium is different.

16. DennisP ◴[04 Apr 25 19:59 UTC] No.43587041{3}[source]▶

>>43586541 #

Assuming the task remains just generating tokens, what sort of reasoning or planning would say is the threshold, before it's no longer "just a form of next token prediction?"

replies(1): >>43590445 #

17. boznz ◴[04 Apr 25 20:03 UTC] No.43587074[source]▶

>>43585338 (TP) #

> we haven't seen the signs of a runaway singularity as some thought was likely.

The signs are not there but while we may not be on an exponential curve (which would be difficult to see), we are definitely on a steep upward one which may get steeper or may fizzle out if LLM's can only reach human level 'intelligence' but not surpass it. Original article was a fun read though and 360,000 words shorter than my very similar fiction novel :-)

replies(2): >>43587434 #>>43597304 #

18. grey-area ◴[04 Apr 25 20:45 UTC] No.43587434[source]▶

>>43587074 #

LLMs don’t have any sort of intelligence at present, they have a large corpus of data and can produce modified copies of it.

replies(2): >>43588092 #>>43588156 #

19. Workaccount2 ◴[04 Apr 25 20:57 UTC] No.43587523{5}[source]▶

>>43586923 #

To be fair, we don't actually know how the human mind works.

The most sure things we know is that it is a physical system, and that does feel like something to be one of these systems.

20. boznz ◴[04 Apr 25 21:59 UTC] No.43588092{3}[source]▶

>>43587434 #

Agree, the "intelligence" part is definitely the missing link in all this, however humans are smart cookies, and can see there's a gap, so I expect someone, (not necessarily a major player,) will eventually figure "it" out.

21. EMIRELADERO ◴[04 Apr 25 22:05 UTC] No.43588156{3}[source]▶

>>43587434 #

While certainly not human-level intelligence, I don't see how you could say they don't have any sort of it. There's clearly generalization there. What would you say is the threshold?

replies(1): >>43588871 #

22. pertymcpert ◴[04 Apr 25 22:11 UTC] No.43588233{3}[source]▶

>>43586541 #

It doesn't really make explain it because then you'd expect lots of nonsensical lines trying to make a sentence that fits with the theme and rhymes at the same time.

23. dangus ◴[04 Apr 25 23:36 UTC] No.43588871{4}[source]▶

>>43588156 #

Seems like you’d have to prove the inverse.

The threshold would be “produce anything that isn’t identical or a minor transfiguration of input training data.”

In my experience my AI assistant in my code editor can’t do a damn thing that isn’t widely documented and sometimes botches tasks that are thoroughly documented (such as hallucinating parameters names that don’t exist). I can witness this when I reach the edge of common use cases where extending beyond the documentation requires following an implication.

For example, AI can’t seem to understand how to help me in any way with Terraform dynamic credentials because the documentation is very sparse, and it is not part of almost any blog posts or examples online. My definition the variable is populated dynamically and real aren’t shown anywhere. I get a lot of irrelevant nonsense suggestions on how to fix it.

AI is a great “amazing search engine” and it can string together combinations of logic that already exist in documentation and examples while changing some names here and there, but what looks like true understanding really is just token prediction.

IMO the massive amount of training data is making the man behind the curtain look way better than he is.

replies(1): >>43588926 #

24. EMIRELADERO ◴[04 Apr 25 23:45 UTC] No.43588926{5}[source]▶

>>43588871 #

That's creativity, not intelligence. LLMs can be intelligent while having very little (or even none at all) creativity. I don't believe one necessarily requires the other.

replies(1): >>43588987 #

25. dangus ◴[04 Apr 25 23:54 UTC] No.43588987{6}[source]▶

>>43588926 #

That’s a garbage cop-out. Intelligence without creativity is not what AI companies are promising to deliver.

Intelligence without creativity is like selling dictionaries.

replies(1): >>43589066 #

26. EMIRELADERO ◴[05 Apr 25 00:09 UTC] No.43589066{7}[source]▶

>>43588987 #

That was an extreme example to illustrate the concept. My point is that reduced/little creativity (which is what the current models have) is not indicative of a total lack of intelligence.

replies(1): >>43592607 #

27. Vegenoid ◴[05 Apr 25 03:30 UTC] No.43590445{4}[source]▶

>>43587041 #

This is an interesting question, but it seems at least possible that as long as the fundamental operation is simply "generate tokens", that it can't go beyond being just a form of next-token prediction. I don't think people were thinking of human thought as a stream of tokens until LLMs came along. This isn't a very well-formed idea, but we may require an AI for which "generating tokens" is just one subsystem of a larger system, rather than the only form of output and interaction.

replies(1): >>43593351 #

28. hnaccount_rng ◴[05 Apr 25 08:45 UTC] No.43591952{3}[source]▶

>>43586541 #

I'm not sure if this is a meaningful distinction: Fundamentally you can describe the world as a "next token predictor". Just treat the world als a simulator with a time step of some quantum of time.

That _probably_ won't capture everything, but for all practical purposes it's non-distinguishable from reality (yes, yes, time is not some constant everywhere)

29. rcrsvpreordnmnt ◴[05 Apr 25 09:49 UTC] No.43592212{3}[source]▶

>>43586541 #

recursive predestination. LLM's algorithms imply 'self-sabotage' in order to 'learn the strings' of 'the' origin.

30. childintime ◴[05 Apr 25 11:04 UTC] No.43592440[source]▶

>>43586381 #

LLM do exactly the same thing humans do: we read the text, raise flag, and flags on flags, on the various topics the text reminds us of, positive and negative, and then starts writing out a response that corresponds to those flags and likely attends to all of them. the planning ahead is just some flag that needs addressing, but it's learnt predictive behavior. nothing much to see here. experience gives you the flags. it's like applying massive pressure and diamonds will form.

31. dangus ◴[05 Apr 25 11:43 UTC] No.43592607{8}[source]▶

>>43589066 #

Boy have I got a dictionary to sell you!

32. DennisP ◴[05 Apr 25 13:32 UTC] No.43593351{5}[source]▶

>>43590445 #

But that means any AI that just talks to you can't be AI by definition. No matter how decisively the AI passes the Turing test, it doesn't matter. It could converse with the top expert in any field as an equal, solve any problem you ask it to solve in math or physics, write stunningly original philosophy papers, or gather evidence from a variety of sources, evaluate them, and reach defensible conclusions. It's all just generating tokens.

Historically, a computer with these sorts of capabilities has always been considered true AI, going back to Alan Turing. Also of course including all sorts of science fiction, from recent movies like Her to older examples like Moon Is A Harsh Mistress.

replies(3): >>43594087 #>>43632599 #>>43636122 #

33. z7 ◴[05 Apr 25 15:12 UTC] No.43594087{6}[source]▶

>>43593351 #

It's just predicting tokens:

https://old.reddit.com/r/singularity/comments/1jl5qfs/its_ju...

34. killerstorm ◴[05 Apr 25 15:54 UTC] No.43594397[source]▶

>>43585338 (TP) #

False. We got from ~0% on SWE-bench to 63%. It's a huge increase of capability in 2 years.

It's like saying that both a baby who can make a few steps and an adult have capability of "walking". It's just wrong.

35. ToValueFunfetti ◴[05 Apr 25 18:28 UTC] No.43595541{5}[source]▶

>>43586923 #

It may well be: https://en.m.wikipedia.org/wiki/Predictive_coding

36. boznz ◴[05 Apr 25 22:23 UTC] No.43597304[source]▶

>>43587074 #

.. I would however add that the ending of my novel was far more exciting.

37. otabdeveloper4 ◴[07 Apr 25 10:15 UTC] No.43609690{3}[source]▶

>>43586803 #

They will, we just need meatspace people to become dumber and more predictable. Making huge strides on that front, actually. (In no small part due to LLMs themselves, yeah.)

38. OrangeMusic ◴[08 Apr 25 07:24 UTC] No.43619183[source]▶

>>43585338 (TP) #

They still can't tell how many Rs in Strawberry

replies(1): >>43634157 #

39. fennecfoxy ◴[08 Apr 25 11:05 UTC] No.43620308{3}[source]▶

>>43586541 #

Yeah, I'd agree that for that model (certainly not AGI) it's just an extension/refinement of next token prediction.

But when we get a big aggregated of all of these little rules and quirks and improvements and subsystems for triggering different behaviours and processes - isn't that all humans are?

I don't think it'll happen for a long ass time, but I'm not one of those individuals who, for some reason, desperately want to believe that humans are special, that we're some magical thing that's unexplainable or can't be recreated.

40. fennecfoxy ◴[08 Apr 25 11:15 UTC] No.43620367{4}[source]▶

>>43586729 #

Potentially, but I'd say we're more reacting.

I will feel and itch and subconsciously scratch it, especially if I'm concentrating on something. That's an subsystem independent of conscious thought.

I suppose it does make sense - that our early evolution consisted of a bunch of small, specific background processes that enables an individual's life to continue; a single celled organism doesn't have neurons but exactly these processes - chemical reactions that keep it "alive".

Then I imagine that some of these processes became complex enough that they needed to be represented by some form of logic, hence evolving neurons.

Subsequently, organisms comprised of many thousands or more of such neuronal subsystems developed higher order subsystems to be able to control/trigger those subsystems based on more advanced stimuli or combinations thereof.

And finally us. I imagine the next step, evolution found that consciousness/intelligence, an overall direction of the efforts of all of these subsystems (still not all consciously controlled) and therefore direction of an individual was much more effective; anticipation, planning and other behaviours of the highest order.

I wouldn't be surprised if, given enough time and the right conditions, that sustained evolution would result in any or most creatures on this planet evolving a conscious brain - I suppose we were just lucky.

replies(1): >>43641486 #

41. LoveMortuus ◴[09 Apr 25 14:31 UTC] No.43632599{6}[source]▶

>>43593351 #

I think one of the massive hurdles, maybe, to overcome when trying to achieve AGI, is how do you solve the issue of doing things without being prompted, you know curiosity and such.

Let's say we have a humanoid robot standing in a room that has a window open, at what point would the AI powering the robot decide that it's time to close the window?

That's probably one of the reasons why, I don't really see LLMs as much more than just algorithms that give us different responses just because we keep changing the seed...

42. senordevnyc ◴[09 Apr 25 16:41 UTC] No.43634157[source]▶

>>43619183 #

This is obviously false. Even 4o and o3-mini can do this.

43. Vegenoid ◴[09 Apr 25 18:57 UTC] No.43636122{6}[source]▶

>>43593351 #

I don't mean that the primary (or only) way that it interacts with a human can't be just text. Right now, the only way it interacts with anything is by generating a stream of tokens. To make any API calls, to use any tool, to make any query for knowledge, it is predicting tokens in the same way as it does when a human asks it a question. There may need to be other subsystems that the LLM subsystem interfaces with to make a more complete intelligence that can internally represent reality and fully utilize abstraction and relations.

replies(1): >>43722963 #

44. throwuxiytayq ◴[10 Apr 25 07:15 UTC] No.43641486{5}[source]▶

>>43620367 #

I feel like the barrier between conscious and unconscious thinking is pretty fuzzy, but that could be down to the individual.

I also think the difference between primitive brains and conscious, reasoning, high level brains could be more quantitative than qualitative. I certainly believe that all mammals (and more) have some sort of an internal conscious experience. And experiments have shown that all sorts of animals are capable of solving simple logical problems.

Also, related article from a couple of days ago: Intelligence Evolved at Least Twice in Vertebrate Animals

replies(1): >>43642723 #

45. fennecfoxy ◴[10 Apr 25 11:08 UTC] No.43642723{6}[source]▶

>>43641486 #

Great points, but my apologies I meant to say "sentience". Certainly many, many animals are already conscious.

I'm not sure about the quantitative thing seeing as there are creatures with brains much physically much larger than ours, or brains with more neurons than we have. We currently have the most known synapses though that also seems to be because we haven't estimated that for so many species.

46. sqw3rl ◴[16 Apr 25 19:53 UTC] No.43709628[source]▶

>>43585338 (TP) #

the argument in the paper seems to be that coding ability is what leads to the tipping point. Eventually human-level (then superhuman) coders augment the AI research process until an AI research agent is developed, and it's exponential from there.

We know they are developing more advanced models, and we know they're secretive about it, but how advanced?... ¯\_(ツ)_/¯

47. Hugsun ◴[17 Apr 25 22:35 UTC] No.43722963{7}[source]▶

>>43636122 #

I have not yet found any compelling evidence that suggests that there are limits to the maximum intelligence of a next token predictor.

Models can be trained to generate tokens with many different meanings, including visual, auditory, textual, and locomotive. Those alone seem sufficient to emulate a human to me.

It would certainly be cool to integrate some subsystems like a symbolic reasoner or calculator or something, but the bitter lesson tells us that we'd be better off just waiting for advancements in computing power.

↑