Jagged AGI: o3, Gemini 2.5, and everything after

1. plaidfuji ◴[21 Apr 25 03:45 UTC] No.43748358[source]▶

Gemini 2.5 Pro is certainly a tipping point for me. Previous LLMs have been very impressive, especially on coding tasks (unsurprising as the answers to these have a preponderance of publicly available data). But outside of a coding assistant, LLMs til now felt like an extra helpful and less garbage-filled Google search.

I just used 2.5 Pro to help write a large research proposal (with significant funding on the line). Without going into detail, it felt to me like the only reason it couldn’t write the entire thing itself is because I didn’t ask it to. And by “ask it”, I mean: enter into the laughably small chat box the entire grant solicitation + instructions, a paragraph of general direction for what I want to explore, and a bunch of unstructured artifacts from prior work, and turn it loose. I just wasn’t audacious enough to try that from the start.

But as the deadline approached, I got more and more unconstrained in how far back I would step and let it take the reins - doing essentially what’s described above but on isolated sections. It would do pretty ridiculously complex stuff, like generate project plans and timelines, cross reference that correctly with other sections of text, etc. I can safely say it was a 10x force multiplier, and that’s being conservative.

For scientific questions (ones that should have publicly available data, not ones relying on internal data), I have started going to 2.5 Pro over senior experts on my own team. And I’m convinced at this point if I were to connect our entire research data corpus to Gemini, that balance would shift even further. Why? Because I can trust it to be objective - not inject its own political or career goals into its answers.

I’m at the point where I feel the main thing holding back “AGI” is people’s audacity to push its limits, plus maybe context windows and compute availability. I say this as someone who’s been a major skeptic up until this point.

replies(9): >>43748425 #>>43749118 #>>43749224 #>>43751750 #>>43753576 #>>43755736 #>>43756318 #>>43756466 #>>43812541 #

2. EvgeniyZh ◴[21 Apr 25 04:06 UTC] No.43748425[source]▶

>>43748358 (TP) #

Is your research AI-, or more generally, CS-related? Because I feel that it is still quite bad (by researcher standards) in physics for example.

3. john_minsk ◴[21 Apr 25 07:03 UTC] No.43749118[source]▶

>>43748358 (TP) #

Did you get funding in the end?

4. MoonGhost ◴[21 Apr 25 07:24 UTC] No.43749224[source]▶

>>43748358 (TP) #

LLMs at this point are stateless calculators without personal experience, life goals, obligations, etc. Till recently people expected to have a character like Terminator or HAL. Now we have intelligence separate from 'soul'. Can calculator be AGI? It can be Artificial, General, and Intelligence. We may need another word for 'creature' with some features of living being.

replies(1): >>43750519 #

5. dcow ◴[21 Apr 25 11:04 UTC] No.43750519[source]▶

>>43749224 #

The term AI has always bothered me for this reason. If the thing is intelligent, then there’s nothing artificial about it… it’s almost an oxymoron.

There are two subtly different definitions in use: (1) “like intelligence in useful ways, but not actually”, and (2) “actually intelligent, but not of human wetware”. I take the A in AGI to be of type (2).

LLMs are doing (1), right now. They may have the “neurological structure” required for (2), but to make a being General and Intelligent it needs to compress its context window persist it to storage every night as it sleeps. It needs memory and agency. It needs to be able to learn in real time and self-adjusting its own weights. And if it’s doing all that, then who is to say it doesn't have a soul?

replies(1): >>43750620 #

6. Jensson ◴[21 Apr 25 11:17 UTC] No.43750620{3}[source]▶

>>43750519 #

> If the thing is intelligent, then there’s nothing artificial about it… it’s almost an oxymoron.

Artificial means human made, if we made a thing that is intelligent, then it is artificial intelligence.

It is like "artificial insemination" means a human designed system to inseminate rather than the natural way. It is still a proper insemination, artificial doesn't mean "fake", it just means unnatural/human made.

replies(2): >>43750727 #>>43754511 #

7. europeanNyan ◴[21 Apr 25 11:33 UTC] No.43750727{4}[source]▶

>>43750620 #

> Artificial means human made, if we made a thing that is intelligent, then it is artificial intelligence.

Aren't humans themselves essentially human made?

Maybe a better definition would be non-human (or inorganic if we want to include intelligence like e.g. dolphins)?

replies(3): >>43750798 #>>43752307 #>>43754375 #

8. Jensson ◴[21 Apr 25 11:42 UTC] No.43750798{5}[source]▶

>>43750727 #

> Aren't humans themselves essentially human made?

Humans evolved, but yeah the definition can be a bit hard to understand since it is hard to separate things. That is why I brought up the artificial insemination example since it deals with this.

> Maybe a better definition would be non-human (or inorganic if we want to include intelligence like e.g. dolphins)?

We also have artificial lakes, they are inorganic but human made.

9. valenterry ◴[21 Apr 25 13:23 UTC] No.43751750[source]▶

>>43748358 (TP) #

And yet it fails with every second refactoring that I ask it to do in a mediocre complicated codebase. What am I doing wrong?

replies(2): >>43756614 #>>43757018 #

10. butlike ◴[21 Apr 25 14:19 UTC] No.43752307{5}[source]▶

>>43750727 #

"ii" (inorganic intelligence) has a better ring to it than AI and can also be stylized as "||" which means OR.

11. burkaman ◴[21 Apr 25 16:11 UTC] No.43753576[source]▶

>>43748358 (TP) #

> For scientific questions (ones that should have publicly available data, not ones relying on internal data), I have started going to 2.5 Pro over senior experts on my own team.

Have you asked any of your experts to double check those bot answers to see how it did?

replies(1): >>43760899 #

12. caconym_ ◴[21 Apr 25 17:32 UTC] No.43754375{5}[source]▶

>>43750727 #

> Aren't humans themselves essentially human made?

No, not in the sense in which the word "made" is being used here.

> Maybe a better definition would be non-human (or inorganic if we want to include intelligence like e.g. dolphins)?

Neither of these work. Calling intelligence in animals "artificial" is absurd, and "inorganic" arbitrarily excludes "head cheese" style approaches to building artificial intelligence.

"Artificial" strongly implies mimicry of something that occurs naturally, and is derived from the same root as "artifice", which can be defined as "to construct by means of skill or specialized art". This obviously excludes the natural biological act of reproduction that produces a newborn human brain (and support equipment) primed to learn and grow; reportedly, sometimes people don't even know they're pregnant until they go into labor (and figure out that's what's happening).

replies(1): >>43756396 #

13. dcow ◴[21 Apr 25 17:46 UTC] No.43754511{4}[source]▶

>>43750620 #

Well, you and I agree, but there’s an entire industry and pop culture throwing the term around rather imprecisely (calling LLMs “AI”) which makes actual discussion about what AGI is, difficult.

I guess I don’t understand the technical difference between AI and AGI and consider AI to refer to the social meme of “this thing kinda seems like it did something intelligent, like magic”.

14. chunkmonke99 ◴[21 Apr 25 19:50 UTC] No.43755736[source]▶

>>43748358 (TP) #

LLMs have broad adoption since like 2 years ago: my GF and sister both have used previous models to write (successful) grant applications with previous iterations so "AGI" has been here since chatgpt's initial release if that is the metric (or baby AGI). I view these as a novel way of "re-configuring" and "re-mixing" human knowledge and that is a BIG DEAL! Also, I am not sure I agree with "people's lack of audacity" is holding back LLMs from achieving "AGI": Dario, Demi, and Sam Altman are promising the end of disease and death in the next 2 to 10 years! And those are some audacious claims (even if they come to be).

15. oytis ◴[21 Apr 25 20:50 UTC] No.43756318[source]▶

>>43748358 (TP) #

> Why? Because I can trust it to be objective - not inject its own political or career goals into its answers.

And that is basically why humanity is doomed.

16. kridsdale3 ◴[21 Apr 25 20:59 UTC] No.43756396{6}[source]▶

>>43754375 #

If I asked my wife if she made our son, she would say yes. It is literally called "labour". Then there is "emotional labour" that lasts for 10 years to do the post-training.

replies(1): >>43757366 #

17. xur17 ◴[21 Apr 25 21:04 UTC] No.43756466[source]▶

>>43748358 (TP) #

Strongly agreed. I used Gemini 2.5 Pro over the weekend to build an entire website + backend system in 8 hours, and it would have taken me over a week to get to the same place myself. My total bill for the entire thing? $10.

I am using Gemini 2.5 Flash for analyzing screenshots of webpages as part of it. Total cost for that (assuming I did my math right) ? $0.00002 / image.

replies(1): >>43758557 #

18. lm28469 ◴[21 Apr 25 21:19 UTC] No.43756614[source]▶

>>43751750 #

IMHO it basically is judgment day, all the people boasting about 100x productivity and "chatgpt basically replaced my job and/or colleagues" had bullshit jobs

19. oytis ◴[21 Apr 25 22:06 UTC] No.43757018[source]▶

>>43751750 #

Might be because you are an expert in what you ask it to do, and actually care about the result. E.g. I'm not sure what a marketing or otherwise business professional would say about the work it did on the cheese business. What has caught my eye is that projected cost of doing business (salaries) is unrealistically low, especially as the volumes are expected to grow

20. caconym_ ◴[21 Apr 25 22:50 UTC] No.43757366{7}[source]▶

>>43756396 #

I drove my car to work today, and while I was at work I drove a meeting. Does this mean my car is a meeting? My meeting was a car?

It turns out that some (many, in fact) words mean different things in different contexts. My comment makes an explicit argument concerning the connotations and nuances of the word "made" used in this context, and you have not responded to that argument.

replies(1): >>43764034 #

21. valenterry ◴[22 Apr 25 02:26 UTC] No.43758557[source]▶

>>43756466 #

It's actually pretty good at that, especially layout/design.

The problem is, the code it produces is usually not great and inconsistent and has subtle bugs. That quickly becomes a problem if you want to change things later, especially while keeping your data consistent and your APIs stable and backwards compatible. At least that's my experience.

But for building something that you can easily throw away later, it's pretty good and saves a lot of time.

22. plaidfuji ◴[22 Apr 25 11:28 UTC] No.43760899[source]▶

>>43753576 #

Yep

23. dcow ◴[22 Apr 25 16:40 UTC] No.43764034{8}[source]▶

>>43757366 #

Judging by this response, I’m guessing you don’t have children of your own. Otherwise you might understand the context.

replies(1): >>43775638 #

24. caconym_ ◴[23 Apr 25 19:10 UTC] No.43775638{9}[source]▶

>>43764034 #

Your guess is wrong!

Maybe you should have written a substantive response to my comments instead of trying and failing to dunk on me. Maybe you don't understand as much as you think you do.

replies(1): >>43778723 #

25. dcow ◴[24 Apr 25 02:24 UTC] No.43778723{10}[source]▶

>>43775638 #

I honestly don’t care enough to even have even remotely thought about my reply as trying to dunk on anything. You’re awfully jacked up for a comment so far down an old thread that you and I are probably the only ones who will ever read it.

replies(1): >>43778855 #

26. caconym_ ◴[24 Apr 25 02:52 UTC] No.43778855{11}[source]▶

>>43778723 #

Okay!

27. vladsanchez ◴[27 Apr 25 15:20 UTC] No.43812541[source]▶

>>43748358 (TP) #

Do you write or blog so that I can follow your writings or opinions elsewhere?

You can email me at vlad dot sanchez at gmail dot com.

Thanks.