AGI Is Still 30 Years Away – Ege Erdil and Tamay Besiroglu

1. andrewstuart ◴[17 Apr 25 17:32 UTC] No.43719877[source]▶

>>43719280 (OP) #

LLMs are basically a library that can talk.

That’s not artificial intelligence.

replies(3): >>43719994 #>>43720037 #>>43722517 #

2. 52-6F-62 ◴[17 Apr 25 17:41 UTC] No.43719994[source]▶

>>43719877 (TP) #

Grammar engines. Or value matrix engines.

Everytime I try to work with them I lose more time than I gain. Net loss every time. Immensely frustrating. If i focus it on a small subtask I can gain some time (rough draft of a test). Anything more advanced and its a monumental waste of time.

They are not even good librarians. They fail miserably at cross referencing and contextualizing without constant leading.

replies(2): >>43720038 #>>43720258 #

3. futureshock ◴[17 Apr 25 17:45 UTC] No.43720037[source]▶

>>43719877 (TP) #

There’s increasing evidence that LLMs are more than that. Especially work by Anthropic has been showing how to trace the internal logic of an LLM as it answers a question. They can in fact reason over facts contained in the model, not just repeat already seen information.

A simple example is how LLMs do math. They are not calculators and have not memorized every sum in existence. Instead they deploy a whole set of mental math techniques that were discovered at training time. For example, Claude uses a special trick for adding 2 digit numbers ending in 6 and 9.

Many more examples in this recent reach report, including evidence of future planning while writing rhyming poetry.

https://www.anthropic.com/research/tracing-thoughts-language...

replies(4): >>43720298 #>>43721540 #>>43722641 #>>43735729 #

4. andrewstuart ◴[17 Apr 25 17:45 UTC] No.43720038[source]▶

>>43719994 #

I feel the opposite.

LLMs are unbelievably useful for me - never have I had a tool more powerful to assist my brain work. I useLLMs for work and play constantly every day.

It pretends to sound like a person and can mimic speech and write and is all around perhaps the greatest wonder created by humanity.

It’s still not artificial intelligence though, it’s a talking library.

replies(1): >>43720148 #

5. 52-6F-62 ◴[17 Apr 25 17:53 UTC] No.43720148{3}[source]▶

>>43720038 #

Fair. For engineering work they have been a terrible drain on me save for the most minor autocomplete. Its recommendations are often deeply flawed or almost totally hallucinated no matter the model. Maybe I am a better software engineer than a “prompt engineer”.

Ive tried to use them as a research assistant in a history project and they have been also quite bad in that respect because of the immense naivety in its approaches.

I couldn’t call them a librarian because librarians are studied and trained in cross referencing material.

They have helped me in some searches but not better than a search engine at a monumentally higher investment cost to the industry.

Then again, I am also speaking as someone who doesn’t like to offload all of my communications to those things. Use it or lose it, eh

replies(1): >>43720263 #

6. aaronbaugher ◴[17 Apr 25 18:01 UTC] No.43720258[source]▶

>>43719994 #

I've only really been experimenting with them for a few days, but I'm kind of torn on it. On the one hand, I can see a lot of things it could be useful for, like indexing all the cluttered files I've saved over the years and looking things up for me faster than I could find|grep. Heck, yesterday I asked one a relationship question, and it gave me pretty good advice. Nothing I couldn't have gotten out of a thousand books and magazines, but it was a lot faster and more focused than doing that.

On the other hand, the prompt/answer interface really limits what you can do with it. I can't just say, like I could with a human assistant, "Here's my calendar. Send me a summary of my appointments each morning, and when I tell you about a new one, record it in here." I can script something like that, and even have the LLM help me write the scripts, but since I can already write scripts, that's only a speed-up at best, not anything revolutionary.

I asked Grok what benefit there would be in having a script fetch the weather forecast data, pass it to Grok in a prompt, and then send the output to my phone. The answer was basically, "So I can say it nicer and remind you to take an umbrella if it sounds rainy." Again, that's kind of neat, but not a big deal.

Maybe I just need to experiment more to see a big advance I can make with it, but right now it's still at the "cool toy" stage.

replies(1): >>43735759 #

7. andrewstuart ◴[17 Apr 25 18:02 UTC] No.43720263{4}[source]▶

>>43720148 #

I’m curious you’re a developer who finds no value in LLMs?

It’s weird to me that there’s such a giant gap with my experience of it bein a minimum 10x multiplier.

8. ahamilton454 ◴[17 Apr 25 20:02 UTC] No.43721540[source]▶

>>43720037 #

I don’t think that is the core of this paper. If anything the paper shows that LLMs have no internal reasoning for math at all. The example they demonstrate is that it triggers the same tokens in randomly unrelated numbers. They kind of just “vibe” there way to a solution

9. alabastervlog ◴[17 Apr 25 21:38 UTC] No.43722517[source]▶

>>43719877 (TP) #

We invented a calculator for language-like things, which is cool, but it’s got a lot of people really mixed up.

The hype men trying to make a buck off them aren’t helping, of course.

10. sksxihve ◴[17 Apr 25 21:53 UTC] No.43722641[source]▶

>>43720037 #

> sometimes this "chain of thought" ends up being misleading; Claude sometimes makes up plausible-sounding steps to get where it wants to go. From a reliability perspective, the problem is that Claude’s "faked" reasoning can be very convincing.

If you ask the LLM to explain how it got the answer the response it gives you won't necessarily be the steps it used to figure out the answer.

11. namaria ◴[19 Apr 25 11:35 UTC] No.43735729[source]▶

>>43720037 #

Oy vey not this paper again.

"Our methods study the model indirectly using a more interpretable “replacement model,” which incompletely and imperfectly captures the original."

"(...) we build a replacement model that approximately reproduces the activations of the original model using more interpretable components. Our replacement model is based on a cross-layer transcoder (CLT) architecture (...)"

https://transformer-circuits.pub/2025/attribution-graphs/bio...

"Remarkably, we can substitute our learned CLT features for the model's MLPs while matching the underlying model's outputs in ~50% of cases."

"Our cross-layer transcoder is trained to mimic the activations of the underlying model at each layer. However, even when it accurately reconstructs the model’s activations, there is no guarantee that it does so via the same mechanisms."

https://transformer-circuits.pub/2025/attribution-graphs/met...

These two papers were designed to be used as the sort of argument that you're making. You point to a blog post that glazes over it. You have to click through the "Read the paper" to find a ~100 page paper, referencing another ~100 page paper to find any of these caveats. The blog post you linked doesn't even feature the words "replacement (model)" or any discussion of the reliability of this approach.

Yet it is happy to make bold claims such as "we look inside Claude 3.5 Haiku, performing deep studies of simple tasks representative of ten crucial model behaviors" which is simply not true.

Sure, they added to the blog post: "the mechanisms we do see may have some artifacts based on our tools which don't reflect what is going on in the underlying model" but that seems like a lot of indirection when the fact is that all observations commented in the papers and the blog posts are about nothing but such artifacts.

12. namaria ◴[19 Apr 25 11:41 UTC] No.43735759{3}[source]▶

>>43720258 #

Beware of Gell-Mann amnesia, validation bias and plain nonsense written into summaries LLMs do.

I have fed ChatGPT a pdf file with activity codes from a local tax authority and asked how I could classify some things I was interested in doing. It invented codes that didn't exist.

I would be very very careful about asking any LLM to organize data for me and trusting the output.

As for "life advice" type of thing, they are very sycophantic. I wouldn't go to a friend who always agrees with me enthusiastically for life advice. That sort of yes man behavior is quite toxic.