Most active commenters
  • fragmede(6)
  • antirez(4)
  • danielmarkbruce(4)
  • mdp2021(4)
  • nurettin(3)
  • jxjnskkzxxhx(3)
  • belter(3)
  • ksec(3)

←back to thread

124 points alphadelphi | 76 comments | | HN request time: 4.739s | source | bottom
1. antirez ◴[] No.43594641[source]
As LLMs do things thought to be impossible before, LeCun adjusts his statements about LLMs, but at the same time his credibility goes lower and lower. He started saying that LLMs were just predicting words using a probabilistic model, like a better Markov Chain, basically. It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, still he was saying that at the time of GPT3.5 I believe. Then he adjusted this vision when talking with Hinton publicly, saying "I don't deny there is more than just probabilistic thing...". He started saying: not longer just simply probabilistic but they can only regurgitate things they saw in the training set, often explicitly telling people that novel questions could NEVER solved by LLMs, with examples of prompts failing at the time he was saying that and so forth. Now reasoning models can solve problems they never saw, and o3 did huge progresses on ARC, so he adjusted again: for AGI we will need more. And so forth.

So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:

1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).

2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.

3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn't started himself) a lot more in perspective, and now it's just a chapter in a book that is composed mostly of other techniques.

Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is stil regressive next token without any architectural change, so, no, they were wrong, too.

replies(15): >>43594669 #>>43594733 #>>43594747 #>>43594812 #>>43594852 #>>43595292 #>>43595501 #>>43595519 #>>43595562 #>>43595668 #>>43596291 #>>43596309 #>>43597354 #>>43597435 #>>43614487 #
2. gcr ◴[] No.43594669[source]
Why is changing one’s mind when confronted with new evidence a negative signifier of reputation for you?
replies(6): >>43594696 #>>43594815 #>>43594919 #>>43595008 #>>43595180 #>>43595298 #
3. antirez ◴[] No.43594696[source]
Because there were plenty of evidences that the statements were either not correct or not based on enough information, at the time they were made. And to be wrong because of personal biases, and then don't clearly state you were wrong when new evidenced appeared, is not a trait of a good scientist. For instance: the strong summarization abilities where already something that, alone, without any further information, were enough to seriously doubt about the stochastic parrot mental model.
replies(4): >>43594725 #>>43594765 #>>43594771 #>>43595670 #
4. jaggederest ◴[] No.43594725{3}[source]
Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.

https://x.com/AukeHoekstra/status/1507047932226375688

replies(1): >>43594837 #
5. sorcerer-mar ◴[] No.43594733[source]
> there is no probabilistic link between the words of a text and the gist of the content

How could that possibly be true?

There’s obviously a link between “[original content] is summarized as [summarized”content]

replies(2): >>43594890 #>>43594959 #
6. nurettin ◴[] No.43594747[source]
Sometimes seeing something that resembles reasoning doesn't really make it reasoning.

What makes it "seem to get better" and what keeps throwing people like lecun off is the training bias, the prompts, the tooling and the billions spent cherry picking information to train on.

What LLMs do best is language generation which leads to, but is not intelligence. If you want someone who was right all along, maybe try Wittgenstein.

7. jxjnskkzxxhx ◴[] No.43594765{3}[source]
I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".

Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.

replies(2): >>43594813 #>>43595232 #
8. Analemma_ ◴[] No.43594771{3}[source]
This is all true, and I'd also add that LeCun has the classic pundit problem of making his opposition to another group too much of his identity, to the detriment of his thinking. So much of his persona and ego is tied up in being a foil to both Silicon Valley hype-peddlers and AI doomers that he's more interested in dunking on them than being correct. Not that those two groups are always right either, but when you're more interested in getting owns on Twitter than having correct thinking, your predictions will always suffer for it.

That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.

9. charcircuit ◴[] No.43594812[source]
>LeCun is directing an AI lab that [built LLMs]

No he's not.

10. antirez ◴[] No.43594813{4}[source]
LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). So they don't learn probabilities, they learn how to continue the sentence with a given token. We apply softmax at the logits for mathematical reasons, and it is natural/simpler to think in terms of probabilities, but that's not what happens, nor the neural networks they are composed of is just able to approximate probabilistic functions. This "next token" probability is the source of a lot misunderstanding. It's much better to imagine the logits as "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." and so forth. Now there are evidences, too, that in the activations producing a given token the LLM already has an idea about how most of the sentence is going to continue.

Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.

It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.

replies(4): >>43594882 #>>43594975 #>>43595199 #>>43595490 #
11. bko ◴[] No.43594815[source]
Because he has a core belief and based on that core belief he made some statements that turned out to be incorrect. But he kept the core belief and adjusted the statements.

So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.

So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.

replies(1): >>43595590 #
12. darkwater ◴[] No.43594837{4}[source]
Who are you referring to?
replies(1): >>43596680 #
13. aprilthird2021 ◴[] No.43594852[source]
> As LLMs do things thought to be impossible before

Like what?

Your timeline doesn't sound crazy outlandish. It sounds pretty normal and lines up with my thoughts as AI has advanced over the past few years. Maybe more conservative than others in the field, but that's not a reason to dismiss him entirely any more than the hypesters should be dismissed entirely because they were over promising and under delivering?

> Now reasoning models can solve problems they never saw

This is not the same as a novel question though.

> o3 did huge progresses on ARC

Is this a benchmark? O3 might be great, but I think the average person's experience with LLMs matches what he's saying, it seems like there is a peak and we're hitting it. It also matches what Ilya said about training data being mostly gone and new architectures (not improvements to existing ones) needing to be the way forward.

> LeCun is directing an AI lab that as the same point has the following huge issues

Second point has nothing to do with the lab and more to do with Meta. Your last point has nothing to do with the lab at all. Meta also said they will have an agent that codes like a junior engineer by the end of the year and they are clearly going to miss that prediction, so does that extra hype put them back in your good books?

14. jxjnskkzxxhx ◴[] No.43594882{5}[source]
I don't think the difference is material, between "they learn probabilities" Vs "they learn how they want a sentence to continue". Seems like an implementation detail to me. In fact, you can add a temperature, set it to zero, and you become deterministic, so no probabilities anywhere. The fact is, they learn from examples of sequences and are very good at finding patterns in those sequences, to a point that they "sound human".

But the point of my response was just that I find it an extremely surprising how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human, and I'm suspicious of anyone who pretends this isn't incredible. Can we agree on this?

replies(1): >>43595549 #
15. DrBenCarson ◴[] No.43594890[source]
It’s not true

The idea that meaning is not impacted by language yet is somehow exclusively captured by language is just absolutely absurd

Like saying X+Y=Z but changing X or Y won’t affect Z

replies(2): >>43595435 #>>43596906 #
16. Maxatar ◴[] No.43594919[source]
He hasn't fundamentally changed his mind. What he's doing is taking what he fundamentally believes and finding more and more elaborate ways of justifying it.
17. aerhardt ◴[] No.43594959[source]
Yea I'm lost there. If we took n bodies of text x_1 ... x_n, and k different summaries each y_1i ...y_kn , there are many statistical and computational treatments with which you would be able to find extremely strong correlations between y and x...
18. klipt ◴[] No.43594975{5}[source]
> LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1).

But enough data implies probabilities. Consider 2 sentences:

"For breakfast I had oats"

"For breakfast I had eggs"

Training on this data, how do you complete "For breakfast I had..."?

There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"

replies(1): >>43595528 #
19. danielmarkbruce ◴[] No.43595008[source]
If you need basically rock solid evidence of X before you stop saying "this thing cannot do X", then you shouldn't be running a forward looking lab. There are only so many directions you can take, only so many resources at your disposal. Your intuition has to be really freakishly good to be running such a lab.

He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.

replies(2): >>43595040 #>>43596502 #
20. SJC_Hacker ◴[] No.43595040{3}[source]
The list of great minds who thought that "new fangled thing is nonsense" and later turned out to be horribly wrong is quite long and distinguished
replies(3): >>43595318 #>>43595476 #>>43595700 #
21. mordymoop ◴[] No.43595180[source]
“Changing your mind” doesn’t really look like what LeCun is doing.

If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.

This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)

They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.

Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.

22. cplat ◴[] No.43595199{5}[source]
I don't understand. Deterministic and stochastic have very specific meanings. The statement: "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." sounds very much like a probability distribution.
replies(1): >>43595361 #
23. fragmede ◴[] No.43595232{4}[source]
There are some that would describe LLMs as next word predictors, akin to having a bag of magnetic words, where you put your hand in, rummage around, and just pick a next word and put it on the fridge and eventually form sentences. It's "just" predicting the next word, so as an analogy as to how they work, that seems reasonable. The thing is, when that bag consists of a dozen bags-in-bags, like Russian nesting dolls, and the "bag" has a hundred million words in it, the analogy stops being a useful description. It's like describing humans as multicellular organisms. It's an accurate description of what a human is, but somewhere between a simple hydra with 100,000 cells and a human with 3 trillion cells, intelligence arises. Describing humans as merely multicellular organisms and using hydra as your point of reference isn't going to get you very far.
24. belter ◴[] No.43595292[source]
But have we established that LLMs dont just interpolate and they can create?

Are we able to prove it with output that's

1) algorithmically novel (not just a recombination)

2) coherent, and

3) not explainable by training data coverage.

No handwaving with scale...

replies(1): >>43595381 #
25. QuantumGood ◴[] No.43595298[source]
When you limit to one framing "changing one's mind", it helps if you point it out, acknowledging that other framings can be possible, otherwise it risks seeming (not necessarily being) manipulative, and you are at least overlooking a large part of the domain. Harvard Decision group called these two of the most insidious drivers of poor decisions "frame blindness" and poor "frame choice". Give more than one frame a chance.
26. fragmede ◴[] No.43595318{4}[source]
> Heavier-than-air flying machines are impossible.

-Lord Kelvin. 1895

> I think there is a world market for maybe five computers. Thomas Watson, IBM. 1943

> On talking films: “They’ll never last.” -Charlie Chaplin.

> This ‘telephone’ has too many shortcomings… -William Orton, Western Union. 1876

> Television won’t be able to hold any market -Darryl Zanuck, 20th Century Fox. 1946

> Louis Pasteur’s theory of germs is ridiculous fiction. -Pierre Pachet, French physiologist.

> Airplanes are interesting toys but of no military value. — Marshal Ferdinand Foch 1911

> There’s no chance the iPhone is going to get any significant market share. — Steve Ballmer, CEO Microsoft CEO. 2007

> Stocks have reached a permanently high plateau. — Irving Fisher, Economist. 1929

> Who the hell wants to hear actors talk? —Harry Warner, Warner Bros. 1927

> By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine. -Paul Krugman, Economist. 1998

replies(4): >>43595556 #>>43595725 #>>43595745 #>>43596122 #
27. antirez ◴[] No.43595361{6}[source]
If you really want to think at it as a probability, think at it as "the probability to express correctly the sentence/idea that was modeled in the activations of the model for that token". Which is totally different than "the probability that this sentence continues in a given way", as the latter is like "how in general this sentence continues", but instead the model picks tokens based on what it is modeling in the latent space.
replies(1): >>43595849 #
28. fragmede ◴[] No.43595381[source]
Why is that the bar though? Imagine LLMs as a kid that has a box of lego with a hundred million blocks in it, and it can assemble those blocks into any configuration possible. Is the fact that the kid doesn't have access to ABS plastic pellets and a molding machine, and so they can't make new pieces; does that really make us think that the kid just interpolates and can't create?
replies(1): >>43595416 #
29. belter ◴[] No.43595416{3}[source]
Actually yes...If the kid spends their whole life in the box and never invents a new block, that’s just combinatorics. We don’t call a chess engine ‘creative’ for finding novel moves, because we understand the rules. LLMs have rules too, they’re called weights.

I want LLMs to create, but so far, every creative output I’ve seen is just a clever remix of training data. The most advanced models still fail a simple test: Restrict the domain, for example, "invent a cookie recipe with no flour, sugar, or eggs" or "name a company without using real words". Suddenly, their creativity collapses into either, nonsense (violating constraints), or trivial recombination, ChocoNutBake instead of NutellaCookie.

If LLMs could actually create, we’d see emergent novelty, outputs that couldn’t exist in the training data. Instead, we get constrained interpolation.

Happy to be proven wrong. Would like to see examples where an LLM output is impossible to map back to its training data.

replies(1): >>43595466 #
30. neom ◴[] No.43595435{3}[source]
Language is a symbolic system. From an absolute or spiritual standpoint, meaning transcends pure linguistic probabilities. Language itself emerges as a limited medium for the expression of consciousness and abstract thought. Indeed, to say meaning arises purely from language (as probability alone) or, to deny language influences meaning entirely are both overly simplistic extremes.
replies(1): >>43602489 #
31. fragmede ◴[] No.43595466{4}[source]
The combinatorics on choosing 500 pieces (words) out of a bag of 1.8 million pieces (approx parameters per layer for GPT-3) with replacement, and order matters works out to be something like 10^4600. Maybe you can't call that creativity, but you've got to admit that's a pretty big number.
replies(1): >>43595543 #
32. danielmarkbruce ◴[] No.43595476{4}[source]
*formerly great minds.

In many cases the folks in question were waaaaay past their best days.

33. mvdtnz ◴[] No.43595490{5}[source]
Just for anyone reading this who isn't sure, much like an LLM this is confident-sounding nonsense.
34. ksec ◴[] No.43595501[source]
>Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is still regressive next token without any architectural change, so, no, they were wrong, too.

Sorry I am a little lost reading the last part about regressive next token and it is still wrong. Could someone explain a little bit? Edit: Explained here further down the thread. ( https://news.ycombinator.com/item?id=43594813 )

I personally went from AI skeptic ( it wont ever replace all human, at least not in the next 10 - 20 years ) to AI scary simply because of the reasoning capability it gained. It is not perfect, far from it but I can immediately infer how both algorithm improvements and hardware advance could bring us in 5 years. And that is not including any new breakthrough.

35. mbesto ◴[] No.43595519[source]
I wanna believe everything you say (because you generally are a credible person) but a few things don't add up:

1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month

2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.

replies(2): >>43595793 #>>43597941 #
36. ksec ◴[] No.43595528{6}[source]
So it is still largely, probabilities pattern matching?
replies(1): >>43602576 #
37. belter ◴[] No.43595543{5}[source]
I said No handwaving with scale. :-)
replies(1): >>43595861 #
38. balamatom ◴[] No.43595549{6}[source]
I don't find anything surprising about that. What humans generally see of each other is little more than outer shells that are made out of sequenced linguistic patterns. They generally find that completely sufficient.

(All things considered, you may be right to be suspicious of me.)

replies(1): >>43596351 #
39. ksec ◴[] No.43595556{5}[source]
I am just wondering did you have this all somehow saved up or did you pull it out of somewhere? Amazing list of things. Thank You.
replies(1): >>43595774 #
40. timewizard ◴[] No.43595562[source]
> So at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea.

One does not follow from the other. In particular I don't "trust" anyone who is trying to make money off this technology. There is way more marketing than honest science happening here.

> and o3 did huge progresses on ARC,

It also cost huge money. The cost increase to go from 75% to 85% was two orders of magnitude greater. This cost scaling is not sustainable. It also only showed progress on ARC1, which it was trained for, and did terribly on ARC2 which it was not trained for.

> Btw, other researchers that were in the LeCun side, changed side recently,

Which "side" researchers are on is the least useful piece of information available.

41. mdp2021 ◴[] No.43595590{3}[source]
I have not followed all of LeCun's past statements, but -

if the "core belief" is that the LLM architecture cannot be the way to AGI, that is more of an "educated bet", which does not get falsified when LLMs improve but still suggest their initial faults. If seeing that LLMs seem constrained in the "reactive system" as opposed to a sought "deliberative system" (or others would say "intuitive" vs "procedural" etc.) was an implicit part of the original "core belief", then it still stands in spite of other improvements.

replies(1): >>43595900 #
42. wat10000 ◴[] No.43595668[source]
LLMs literally are just predicting tokens with a probabilistic model. They’re incredibly complicated and sophisticated models, but they still are just incredibly complicated and sophisticated models for predicting tokens. It’s maybe unexpected that such a thing can do summarization, but it demonstrably can.
replies(2): >>43596064 #>>43598184 #
43. mdp2021 ◴[] No.43595670{3}[source]
> strong summarization abilities

Which LLMs have shown you "strong summarization abilities"?

44. MrMcCall ◴[] No.43595700{4}[source]
I doubt that list is as long as the great minds that glommed onto a new tech that turned out to be a dud, but I could be wrong. It's an interesting question, but each tech needs to be evaluated separately.
45. MrMcCall ◴[] No.43595725{5}[source]
I'm pretty sure that Lord Kelvin was also in the cohort of fools that bullied Boltzmann to his suicide.
46. mdp2021 ◴[] No.43595745{5}[source]
An important number of those remarks were based on a snapshot of the state of the technology: a fault in not seeing the potential evolution.

Examples of people who could not see non (in some way) dead-ends do not cancel examples of people who correctly saw dead-ends. The lists may even overlap ("if it remains that way it's a dead-end").

47. fragmede ◴[] No.43595774{6}[source]
Gosh no. I knew most of that list but I'll be honest and tell you that I used ChatGPT to come up with it. it's a collection of quotes to begin with so I think that's okay. I'm not passing off someone else's writing as my own, I'm explicitly quoting them.
48. redlock ◴[] No.43595793[source]
Doesn't OpenRouter ranking include pricing?

Not really a good measure of quality or performance but of cost effectiveness

replies(1): >>43596234 #
49. cplat ◴[] No.43595849{7}[source]
That's not quite how auto-regressive models are trained (the expression of "ideas" bit). There is no notion of "ideas." Words are not defined like we humans do, they're only related.

And on the latent space bit, it's also true for classical models, and the basic idea behind any pattern recognition or dimensionality reduction. That doesn't mean it's necessarily "getting the right idea."

Again, I don't want to "think of it as a probability." I'm saying what you're describing is a probability distribution. Do you have a citation for "probability to express correctly the sentence/idea" bit? Because just having a latent space is no implication of representing an idea.

50. fragmede ◴[] No.43595861{6}[source]
Right—but why should “new ABS plastic” be the bar for creativity? If the kid builds a structure no one’s ever imagined, from an unimaginably large box of Lego, isn’t that still novel? Sure, it’s made from known parts—but so is language. So are most human ideas.

The demand for outputs that are provably untraceable to training data feels like asking for magic, not creativity. Even Gödel didn’t require “never seen before atoms” to demonstrate emergence.

51. bko ◴[] No.43595900{4}[source]
If you say LLMs are a dead end, and you give a few examples of things they will never be able to do, and a few months later they do it, and you just respond by stating that sure they can do that but they're still a dead end and won't be able to do this.

Rinse and repeat.

After a while you question whether LLMs are actually a dead end

replies(1): >>43596084 #
52. Workaccount2 ◴[] No.43596064[source]
The rub is that we don't know if intelligence is anything more than "just predicting next output".
replies(1): >>43596657 #
53. mdp2021 ◴[] No.43596084{5}[source]
This is a normal routine topical in Epistemology in the perspective of Lakatos.

As I said, it will depend on whether the examples in question were actually substantial part of the "core belief".

For example: "But can they perform procedures?" // "Look at that now" // "But can they do it structurally? Consistently? Reliably?" // "Look at that now" // "But is that reasoning integrated or external?" // "Look at that now" // "But is their reasoning fully procedurally vetted?" (etc.)

I.e.: is the "progress" (which would be the "anomaly" in scientific prediction) part of the "substance" or part of the "form"?

54. jcalvinowens ◴[] No.43596122{5}[source]
In fairness to Irving Fisher: if you bought into the market at its peak in 1929, you wouldn't recover your original investment until about 1960.
55. mbesto ◴[] No.43596234{3}[source]
I mean it literally says on the page:

"Shown are the sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer."

Also, it ranks the use of Llama that is provided by cloud providers (for example, AWS Lamda).

I get that OpenRouter is imperfect but its a good proxy to objectively make a claim that an LLM is "the weakest ever"

56. deepfriedchokes ◴[] No.43596291[source]
Everything is possible with math. Just ask string theorists.
57. thesz ◴[] No.43596309[source]
> there is no probabilistic link between the words of a text and the gist of the content

Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.

[1] https://en.wikipedia.org/wiki/Collocation

Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.

58. jxjnskkzxxhx ◴[] No.43596351{7}[source]
Nah, to me you're just an average person on the internet. If the recent developments don't surprise you, I just chalk it up to lack of curiosity. I'm well aware that people like you exist, most people are like that in fact. My comment was referring to experts specifically.
replies(1): >>43597036 #
59. nurettin ◴[] No.43596502{3}[source]
I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?

I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.

replies(1): >>43597844 #
60. sangnoir ◴[] No.43596657{3}[source]
I think we do.
replies(1): >>43599813 #
61. jaggederest ◴[] No.43596680{5}[source]
Yann LeCun. I think this post is a more elegant summary of what I'm trying to illustrate with this kind of thing:

> he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.

https://news.ycombinator.com/item?id=43594771

62. bitethecutebait ◴[] No.43596906{3}[source]
... meaning is not always impacted by the specificity or sensitivity of language while sometimes indeed exclusively captured by it, although the exclusivity is more of a time-dependent thing as one could imagine a silent, theatrical piece that captures the very same meaning but the 'phantasiac' is probably constructing the scene(s) out of words ... but then again ... there either was, is or will be at least one Savant to whom this does not apply ... and maybe 'some' deaf and blind person, too ...
63. balamatom ◴[] No.43597036{8}[source]
>how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human

What surprises me is the assumption that there's more than "find patterns in sequences" to "sounding human" i.e. to emitting human-like communication patterns. What else could there be to it? It's a tautology.

>If the recent developments don't surprise you, I just chalk it up to lack of curiosity.

Recent developments don't surprise me in the least. I am, however, curious enough to be absolutely terrified by them. For one, behind the human-shaped communication sequences there could previously be assumed to be an actual human.

64. daveguy ◴[] No.43597354[source]
o3 progress on ARC was not a zero shot. It was based on fine tuning to the particular data set. A major point of ARC is that humans do not need fine tuning more than being explained what the problem is. And a few humans working on it together after minimal explanation can achieve 100%.

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

I'm not sure if you watched the entire video, but there were insightful observations. I don't think anyone believes LLMs aren't a significant breakthrough in HCI and language modelling. But it is many layers with many winters away from AGI.

65. daveguy ◴[] No.43597435[source]
o3 progress on ARC was not a zero shot. It was based on fine tuning to the particular data set. A major point of ARC is that humans do not need fine tuning more than being explained what the problem is. And a few humans working on it together after minimal explanation can achieve 100%.

o3 doing well on ARC after domain training is not a great argument. There is something significant missing from LLMs being intelligent.

I'm not sure if you watched the entire video, but there were insightful observations. I don't think anyone believes LLMs aren't a significant breakthrough in HCI and language modelling. But it is many layers with many winters away from AGI.

Also, understanding human and machine intelligence isn't about sides. And CoT is not symbolic reasoning.

66. danielmarkbruce ◴[] No.43597844{4}[source]
It depends what you mean by "engineering these solutions into the model". Using better data leads to better models given the same architecture and training. Nothing wrong with it, it's hard work, it might be with as specific goal in mind. LLM "breakthroughs" aren't really a thing at this point. It's just one little thing after another.
replies(1): >>43598430 #
67. kristianp ◴[] No.43597941[source]
I've found the llama 3 served by meta.ai to be quite weak for coding prompts, it gets confused by more complex tasks. Maybe its a smaller model? I agree it's weaker than others of its generation.
68. ◴[] No.43598184[source]
69. nurettin ◴[] No.43598430{5}[source]
Sure, I specifically pre-agreed to it not being ill will. What I mean is keeping tabs on the latest demand (newer benchmarks) and making sure their model delivers in some fashion. But it is mundane and they don't say that. And when a major number increases, people don't assume they just added more specific training data.
replies(1): >>43603251 #
70. sho ◴[] No.43599813{4}[source]
That's just what you were most likely to say...
replies(1): >>43607383 #
71. aerhardt ◴[] No.43602489{4}[source]
"When he to whom one speaks does not understand, and he who speaks himself does not understand, that is metaphysics." - Voltaire

Like I said in another comment, I can think of a dozen statistical and computational methods where if you give me a text and its synthesis I can find a strong probabilistic link between the two.

replies(1): >>43602929 #
72. klipt ◴[] No.43602576{7}[source]
You can model the whole universe with probabilities!
73. neom ◴[] No.43602929{5}[source]
"Not everything that counts can be counted, and not everything that can be counted counts." - Someone.

Statistical correlation between text and synthesis undoubtedly exists, but capturing correlation does not imply you've encapsulated meaning itself. My point is precisely that: meaning isn't confined entirely within what we can statistically measure, though it may still be illuminated by it.

74. danielmarkbruce ◴[] No.43603251{6}[source]
Yup, it's a fair point. We very quickly got down to the nitty gritty with these things. Hopefully, like semiconductors nitty gritty results in a lot of big performance gains for decades.
75. ◴[] No.43607383{5}[source]
76. pllbnk ◴[] No.43614487[source]
> It was already pretty clear that this was not the case as even GPT3 could do summarization well enough, and there is no probabilistic link between the words of a text and the gist of the content, <...>

I am not an expert by any means but have some knowledge of the technicalities of the LLMs and my limited knowledge allows me to disagree with your statement. The models are trained on an ungodly amount of text, so they become very advanced statistical token prediction machines with magic randomness sprinkled in to make the outputs more interesting. After that, they are fine tuned on very believable dialogues, so their statistical weights are skewed in a way that when subject A (the user) tells something, subject B (the LLM-turned-chatbot) has to say something back which statistically should make sense (which it almost always does since they are trained on it in the first place). Try to paste random text - you will get a random reply. Now try to paste the same random text and ask the chatbot to summarize it - your randomness space will be reduced and it will be turned into a summary because the finetuning gave the LLM the "knowledge" what the summarization _looks like_ (not what it _means_).

Just to prove that you are wrong: ask your favorite LLM if your statement is correct and you will probably see it output that it is not.