Most active commenters
  • danbruc(8)
  • Davidzheng(7)
  • myrmidon(3)
  • jazzyjackson(3)

←back to thread

128 points ArmageddonIt | 31 comments | | HN request time: 1.3s | source | bottom
1. danbruc ◴[] No.44500955[source]
Let us see how this will age. The current generation of AI models will turn out to be essentially a dead end. I have no doubt that AI will eventually fundamentally change a lot of things, but it will not be large language models [1]. And I think there is no path of gradual improvement, we still need some fundamental new ideas. Integration with external tools will help but not overcome fundamental limitations. Once the hype is over, I think large language models will have a place as simpler and more accessible user interface just like graphical user interfaces displaced a lot of text based interfaces and they will be a powerful tool for language processing that is hard or impossible to do with more traditional tools like statistical analysis and so on.

[1] Large language models may become an important component in whatever comes next, but I think we still need a component that can do proper reasoning and has proper memory not susceptible to hallucinating facts.

replies(5): >>44501079 #>>44501283 #>>44502224 #>>44505345 #>>44505828 #
2. Davidzheng ◴[] No.44501079[source]
Sorry but to say current LLMs are a "dead end" is kind of insane if you compare with the previous records at general AI before LLMs. The earlier language models would be happy to be SOTA in 5 random benchmarks (like sentiment or some types of multiple choice questions) and SOTA otherwise consisted of some AIs that could play like 50 Atari games. And out of nowhere we have AI models that can do tasks which are not in training set, pass turing tests, tell jokes, and work out of box on robots. It's literally insane level of progress and even if current techniques don't get to full human-level, it will not have been a dead end in any sense.
replies(2): >>44501151 #>>44501260 #
3. danbruc ◴[] No.44501151[source]
I think large language models have essentially zero reasoning capacity. Train a large language model without exposing it to some topic, say mathematics, during training. Now expose the model to mathematics, feed it basic school books and explanations and exercises just like a teacher would teach mathematics to children in school. I think the model would not be able to learn mathematics this way to any meaningful extend.
replies(1): >>44502242 #
4. jayd16 ◴[] No.44501260[source]
Something can be much better than before but still be a dead end. Literally a dead end road can take you closer but never get you there.
replies(1): >>44502034 #
5. myrmidon ◴[] No.44501283[source]
> The current generation of AI models will turn out to be essentially a dead end.

It seems a matter of perspective to me whether you call it "dead end" or "stepping stone".

To give some pause before dismissing the current state of the art prematurely:

I would already consider LLM based current systems more "intelligent" than a housecat. And a pets intelligence is enough to have ethical implications, so we arguably reached a very important milestone already.

I would argue that the biggest limitation on current "AI" is that it is architected to not have agency; if you had GPT-3 level intelligence in an easily anthropomorphizeable package (furby-style, capable of emoting/communicating by itself) public outlook might shift drastically without even any real technical progress.

replies(4): >>44501468 #>>44504891 #>>44505152 #>>44506234 #
6. danbruc ◴[] No.44501468[source]
I think the main thing I want from an AI in order to call it intelligent is the ability to reason. I provide an explanation of how long multiplication works and then the AI is capable of multiplying arbitrary large numbers. And - correct me if I am wrong - large language models can not do this. And this despite probably being exposed to a lot of mathematics during training whereas in a strong version of this test I would want nothing related to long multiplication in the training data.
replies(1): >>44501790 #
7. myrmidon ◴[] No.44501790{3}[source]
I'm not sure if popular models cheat at this, but if I ask for it (o3-mini) I get correct results/intermediate values (for 794206 * 43124, chosen randomly).

I do suspect this is only achieveable because the model was specifically trained for this.

But the same is true for humans; children can't really "reason themselves" into basic arithmetic-- that's a skill that requires considerable training.

I do concede that this (learning/skill aquisition) is something that humans can do "online" (within days/weeks/months) while LLMs need a separate process for it.

> in a strong version of this test I would want nothing related to long multiplication in the training data.

Is this not a bit of a double standard? I think at least 99/100 humans with minimal previous math exposure would utterly fail this test.

replies(1): >>44502178 #
8. Davidzheng ◴[] No.44502034{3}[source]
But dead end to what? All progress eventually plateaus somewhere? It's clearly insanely useful in practice. And do you think there will be any future AGI whose development is not helped by current LLM technology? Even if the architecture is completely different the ability of LLMs to understand humans data automatically is unparalleled.
replies(3): >>44502497 #>>44504668 #>>44505169 #
9. danbruc ◴[] No.44502178{4}[source]
I just tested it with Copilot with two random 45 digit numbers and it gets it correct by translating it into Python and running it in the background. When I asked to not use any external tools, it got the first five, the last two, and a hand full more digits in the middle correct, out of 90. It also fails to calculate the 45 intermediate products - one number times one digit from the other - including multiplying by zero and one.

The models can do surprisingly large numbers correctly, but they essentially memorized them. As you make the numbers longer and longer, the result becomes garbage. If they would actually reason about it, this would not happen, multiplying those long numbers is not really harder than multiplying two digit numbers, just more time consuming and annoying.

And I do not want the model to figure multiplication out on its own, I want to provide it with what teachers tell children until they get to long multiplication. The only thing where I want to push the AI is to do it for much longer numbers, not only two, three, four digits or whatever you do in primary school.

And the difference is not only in online vs offline, large language models have almost certainly been trained on heaps of basic mathematics, but did not learn to multiply. They can explain to you how to do it because they have seen countless explanation and examples, but they can not actually do it themselves.

replies(1): >>44509249 #
10. jmathai ◴[] No.44502224[source]
This is a surprising take. I think what's available today can improve productivity by 20% across the board. That seems massive.

Only a very small % of the population is leveraging AI in any meaningful way. But I think today's tools are sufficient for them to do so if they wanted to start and will only get better (even if the LLMs don't, which they will).

replies(2): >>44502402 #>>44505258 #
11. Davidzheng ◴[] No.44502242{3}[source]
Current generation of LLMs have very limited ability to learn new skills at inference time. I disagree this means they cannot reason. I think reasoning is by an large a skill which can be taught at training time.
replies(1): >>44502525 #
12. danbruc ◴[] No.44502402[source]
Sure, if I ask about things I know nothing about, then I can get something done with little effort. But when I ask about something where I am an expert, then large language models have surprisingly little to offer. And because I am an expert, it becomes apparent how bad they are, which in turn makes me hesitate to use them for things I know nothing about because I am unprepared to judge the quality of the response. As a developer I am an expert on programming and I think I never got something useful out of a large language model beyond pointers to relevant APIs or standards, a very good tool to search through documentation, at least up to the point that it starts hallucinating stuff.

When I wrote dead end, I meant for achieving an AI that can properly reason and knows what it knows and maybe is even able to learn. For finding stuff in heaps of text, large language models are relatively fine and can improve productivity, with the somewhat annoying fact that one has to double check what the model says.

13. danbruc ◴[] No.44502497{4}[source]
To reaching AI that can reason. And sure, as I wrote, large language models might become a relevant component for processing natural language inputs and outputs, but I do not see a path towards large language models becoming able to reason without some fundamentally new ideas. At the moment we try to paper over this deficit by giving large language model access to all kind of external tools like search engines, compilers, theorem provers, and so on.
replies(1): >>44506019 #
14. danbruc ◴[] No.44502525{4}[source]
Do you have an example of some reasoning ability any of the large language models has learned? Or do you just mean that you think, we could train them in principle?
replies(1): >>44506020 #
15. jayd16 ◴[] No.44504668{4}[source]
> the ability of LLMs to understand

But it doesn't understand. Its just similarity and next likely token search. The trick is that turns out to be useful or pleasing when tuned well enough.

replies(1): >>44506039 #
16. andrewflnr ◴[] No.44504891[source]
Intelligence alone does not have ethical implications w.r.t. how we treat the intelligent entity. Suffering has ethical implications, but intelligence does not imply suffering. There's no evidence that LLMs can suffer (note that that's less evidence than for, say, crayfish suffering).
17. jazzyjackson ◴[] No.44505152[source]
If you asked your cat to make a REST API call I suppose it would fail, but the same applies if you asked a chatbot to predict realtime prey behavior.
replies(1): >>44507515 #
18. jazzyjackson ◴[] No.44505169{4}[source]
You're in a bubble. Anyone who is responsible for making decisions and not just generating text for a living has more trouble seeing what is "insanely useful" about language models.
replies(2): >>44506024 #>>44506554 #
19. bigstrat2003 ◴[] No.44505258[source]
I think that what's available today is a drain on productivity, not an improvement, because it's so unreliable that you have to babysit it constantly to make sure it hasn't fucked up. That is not exactly reassuring as to the future, in my view.
20. socalgal2 ◴[] No.44505345[source]
Isn't this entirely missing the point of the article?

> When early automobiles began appearing in the 1890’s — first steam-powered, then electric, then gasoline –most carriage and wagon makers dismissed them. Why wouldn’t they? The first cars were: Loud and unreliable, Expensive and hard to repair, Starved for fuel in a world with no gas stations, Unsuitable for the dirt roads of rural America

That sounds like complaints against today's LLM limitations. It will be interesting to see how your comment ages in 5-10-15 years. You might be technically right that LLMs are a dead end. But the article isn't about LLMs really, it's about the change to an "AI" world from a non-AI world and how the author believes it will be similar to the change from the non-car to the car world.

21. mcswell ◴[] No.44505828[source]
It may be that LLM-AI is a dead end on the path to General AI (although I suspect it will instead turn out to be one component). But that doesn't mean that LLMs aren't good for some things. From what I've seen, they represent a huge improvement in (machine) translation, for example. And reportedly they're pretty good at spiffing up human-written text, and maybe even generating text--provided the human is on the lookout for hallucinations (and knows how to watch for that).

You might even say LLMs are good with text in the same way that early automobiles were good for transportation, provided you watched out for the potholes and stream crossings and didn't try to cross the river on the railroad bridge. (DeLoreans are said to be good at that, though :).)

22. Davidzheng ◴[] No.44506019{5}[source]
When LLMs attempt to some novel problems (I'm thinking of pure mathematics here) they can try possible approaches and examine by themselves which approaches are working and not and then come to conclusions. That is enough for me to conclude they are reasoning.
23. Davidzheng ◴[] No.44506020{5}[source]
See my other answer.
24. Davidzheng ◴[] No.44506024{5}[source]
Anthropic and OpenAI researchers themselves certainly use AI--do you think they generate text for a living.
replies(1): >>44506064 #
25. Davidzheng ◴[] No.44506039{5}[source]
Implementation doesn't matter. In so much as human understanding can be reflected in a text conversation, its distribution can be approximated using a distribution in next token prediction. Hence there exist next token predictors which are indistinguishable from a human over text--and I do not distinguish identical behaviors.
26. jazzyjackson ◴[] No.44506064{6}[source]
What do they use it for?

edit (it's late, I'm just being a snark. I don't think researchers whose job is implicitly tied to hype is a good example of a worker increasing their productivity)

27. hattmall ◴[] No.44506234[source]
>I would already consider LLM based current systems more "intelligent" than a housecat.

An interesting experiment would be to have a robot with an LLM mind and see what things it could figure out, like would it learn to charge itself or something. But personally I don't think they have anywhere near the general intelligence of animals.

28. thejohnconway ◴[] No.44506554{5}[source]
I don’t think you’re right about that. LLMs are very good for exploring half-formed ideas, (what materials could I look at for x project?), generating small amounts of code when it’s not your main job, and writing boring crap like grant applications.

That last one isn’t useful to society, but it is for the individual.

I know plenty of people using LLMs using for stuff like this, in all sorts of walks of life.

29. myrmidon ◴[] No.44507515{3}[source]
I think LLMs are much closer to grasping movement prediction than the cat is to learning english for what its worth.

IMO "ability to communicate" is a somewhat fair proxy for intelligence (even if it does not capture all of an animals capabilities), and current LLMs are clearly superior to any animal in that regard.

30. snowwrestler ◴[] No.44509249{5}[source]
When kids learn multiplication, they learn it on paper, not just in their heads. LLMs don’t have access to paper.

“Do long arithmetic entirely in your mind” is not a test most humans can pass. Maybe a few savants. This makes me suspect it is not a reliable test of reasoning.

Humans also get a training run every night. As we sleep, our brains are integrating our experiences from the day into our existing minds, so we can learn things from day to day. Kids definitely do not learn long multiplication in just one day. LLMs don’t work like this; they get only one training run and that is when they have to learn everything all at once.

LLMs for sure cannot learn and reason the same way humans do. Does that mean they cannot reason at all? Harder question IMO. You’re right that Python did the math, but the LLM wrote the Python. Maybe that is like their version of “doing it on paper.”

replies(1): >>44511359 #
31. danbruc ◴[] No.44511359{6}[source]
They have access to paper, the model output, that is what reasoning models use to keep track of the chain of thought. When I asked Copilot what kind of external resources it can use, it also claimed that it has access to some scratchpad memory, which might or might not be true, did not try to verify that.

Also I am not asking to learn it in one day, you can dump everything that a child would hear and read during primary school into the context. You can even do it interactively, maybe the model has questions.