Most active commenters
  • YeGoblynQueenne(17)
  • dilap(3)
  • Vecr(3)
  • mitthrowaway2(3)

←back to thread

Interview with gwern

(www.dwarkeshpatel.com)
308 points synthmeat | 46 comments | | HN request time: 1.424s | source | bottom
1. YeGoblynQueenne ◴[] No.42135916[source]
This will come across as vituperative and I guess it is a bit but I've interacted with Gwern on this forum and the interaction that has stuck to me is in this thread, where Gwern mistakes a^nb^n as a regular (but not context-free) language (and calls my comment "not even wrong"):

https://news.ycombinator.com/item?id=21559620

Again I'm sorry for the negativity, but already at the time Gwern was held up by a certain, large, section of the community as an important influencer in AI. For me that's just a great example of how basically the vast majority of AI influencers (who vie for influence on social media, rather than research) are basically clueless about AI and CS and only have second-hand knowledge, which I guess they're good at organising and popularising, but not more than that. It's easy to be a cheer leader for the mainstream view on AI. The hard part is finding, and following, unique directions.

With apologies again for the negative slant of the comment.

replies(10): >>42136055 #>>42136148 #>>42136538 #>>42136759 #>>42137041 #>>42137215 #>>42137274 #>>42137284 #>>42137350 #>>42137636 #
2. aubanel ◴[] No.42136055[source]
> For me that's just a great example of how basically the vast majority of AI influencers (who vie for influence on social media, rather than research) are basically clueless about AI and CS

This is a bit stark: there are many great knowledgeable engineers and scientists who would not get your point about a^nb^n. It's impossible to know 100% of of such a wide area as "AI and CS".

replies(2): >>42136162 #>>42136565 #
3. empiricus ◴[] No.42136148[source]
Minor: if n is finite, then a^nb^n becomes regular?
replies(2): >>42136273 #>>42136285 #
4. nocobot ◴[] No.42136162[source]
is it really? this is the most common example for context free languages and something most first year CS students will be familiar with.

totally agree that you can be a great engineer and not be familiar with it, but seems weird for an expert in the field to confidently make wrong statements about this.

replies(2): >>42136390 #>>42139265 #
5. YeGoblynQueenne ◴[] No.42136273[source]
a^nb^n is regular, but it is also context free. I don't think there's a restriction on the n. Why do you say this?

Edit: sorry, I read "finite" as "infinite" :0 But n can be infinite and a^nb^n is still regular, and also context free. To be clear, the Chomskky Hierarchy of formal languages goes like this:

Finite ⊆ Regular ⊆ Context-Free ⊆ Context-Sensitive ⊆ Recursively Enumerable

That's because formal languages are identified with the automata that accept them and when an automaton accepts e.g. the Recursively Enumerable languages, then it also accepts the context-sensitive languages, and so on all the way down to the finite languages. One way to think of this is that an automaton is "powerful enough" to recognise the set of strings that make up a language.

6. nahumfarchi ◴[] No.42136285[source]
Yes, all finite languages are regular.

Specifically, you can construct a finite automata to represent it.

7. YeGoblynQueenne ◴[] No.42136390{3}[source]
Thanks, that's what I meant. a^nb^n is a standard test of learnability.

That stuff is still absolutely relevant, btw. Some DL people like to dismiss it as irrelevant but that's just because they lack the background to appreciate why it matters. Also: the arrogance of youth (hey I've already been a postdoc for a year, I'm ancient). Here's a recent paper on Neural Networks and the Chomsky Hierarchy that tests RNNs and Transformers on formal languages (I think it doesn't test on a^nb^n directly but tests similar a-b based CF languages):

https://arxiv.org/abs/2207.02098

And btw that's a good paper. Probably one of the most satisfying DL papers I've read in recent years. You know when you read a paper and you get this feeling of satiation, like "aaah, that hit the spot"? That's the kind of paper.

replies(1): >>42137043 #
8. newmanpo ◴[] No.42136538[source]
I take the Feynman view here; vain memory tricks are not themselves net new production, so just look known things up in the book.

Appreciate the diversity in the effort, but engineering is making things people can use without having to know it all. Far more interesting endeavor than being a human Google search engine.

replies(2): >>42136683 #>>42137035 #
9. YeGoblynQueenne ◴[] No.42136565[source]
>> This is a bit stark: there are many great knowledgeable engineers and scientists who would not get your point about a^nb^n. It's impossible to know 100% of of such a wide area as "AI and CS".

I think, engineers, yes, especially those who don't have a background in academic CS. But scientists, no, I don't think so. I don't think it's possible to be a computer scientist without knowing the difference between a regular and a super-regular language. As to knowing that a^nb^n specifically is context-free, as I suggest in the sibling comment, computer scientists who are also AI specialists would recognise a^nb^n immediately, as they would Dyck languages and Reber grammars, because those are standard tests of learnability used to demonstrate various principles, from the good old days of purely symbolic AI, to the brave new world of modern deep learning.

For example, I learned about Reber grammars for the first time when I was trying to understand LSTMs, when they were all the hype in Deep Learning, at the time I was doing my MSc in 2014. Online tutorials on coding LSTMs used Reber grammars as the dataset (because, as with other formal grammars it's easy to generate tons of strings from them and that's awfully convenient for big data approaches).

Btw that's really the difference between a computer scientist and a computer engineer: the scientist knows the theory. That's what they do to you in CS school, they drill that stuff in your head with extreme prejudice; at least the good schools do. I see this with my partner who is 10 times a better engineer than me and yet hasn't got a clue what all this Chomsky hierarhcy stuff is. But then, my partner is not trying to be an AI influencer.

replies(1): >>42137222 #
10. YeGoblynQueenne ◴[] No.42136683[source]
No, look. If a student (I'm not a professor, just a post-doc) doesn't know this stuff, I'll point them to the book so they can look it up, and move on. But the student will not tell me I'm "not even wrong" with the arrogance of fifty cardinals while at the same time pretending to be an expert [1]. It's OK to not know, it's OK to not know that you don't know, but arrogant ignorance is not a good look on anyone.

And there's a limit to what you need to look up in a book. The limit moves further up the more you work with a certain kind of tool or study a certain kind of knowledge. I have to look up trigonometry every single time I need it because I only use it sparingly. I don't need to look up SLD-Resolution, which is my main subject. How much would Feynman need to look up when debating physics?

So when someone like Feynman talks about physics, you listen carefully because you know they know their shit and a certain kind of nerd deeply appreciates deep knowledge. When someone elbows themselves in the limelight and demands everyone treats them as an expert, but they don't know the basics, what do you conclude? I conclude that they're pretending to know a bunch of stuff they don't know.

________________

[1] ... some do. But they're students so it's OK, they're just excited to have learned so much and don't yet know how much they don't. You explain the mistake, point them to the book, and move on.

replies(2): >>42136881 #>>42137036 #
11. dilap ◴[] No.42136759[source]
Regarding your linked comment, my takeaway is that the very theoretical task of being able to recognize an infinite language isn't very relevent to the non-formal, intuitive idea of "intelligence"

Transformers can easily intellectually understand a^nb^n, even though they couldn't recognize whether an arbitrarily long string is a member of the language -- a restriction humans share!, since eventually a human, too, would lose track of the count, for a long enough string.

replies(2): >>42136846 #>>42136925 #
12. YeGoblynQueenne ◴[] No.42136846[source]
I don't know what "intellectually understand" means in the context of Transformers. My older comment was about the ability of neural nets to learn automata from examples, a standard measure of the learning ability of a machine learning system. I link to a paper below where Transformers and RNNs are compared on their ability to learn automata along the entire Chomsky hierarchy and as other work has also shown, they don't do that well (although there are some surprising surprises).

>> Regarding your linked comment, my takeaway is that the very theoretical task of being able to recognize an infinite language isn't very relevent to the non-formal, intuitive idea of "intelligence"

That depends on who you ask. My view is that automata are relevant to computation and that's why we study them in computer science. If we were biologists, we would study beetles. The question is whether computation , as we understand it on the basis of computer science, has anything to do with intelligence. I think it does, but that it's not the whole shebang. There is a long debate on that in AI and the cognitive sciences and the jury is still out, despite what many of the people working on LLMs seem to believe.

replies(2): >>42137144 #>>42137319 #
13. newmanpo ◴[] No.42136881{3}[source]
Ok so you’re arrogantly practicing English writing now and producing little more than a philosophy that just zigs around to maintain your narrative.

“Debating physics” in academia is little more than reciting preferred symbolic logic. The map is not the terrain; to borrow from Feynman again, who says in his lectures it’s not good to get hung up on such stuff. So you’re just making this about rhetorical competition, boring gamesmanship. Which like I said, isn’t the point of engineering. Dialectics don’t ship, they keep professors employed though.

Whichever way it’s written, physics still seems to work. So the value of STEM is more about the application than the knowing.

14. raverbashing ◴[] No.42136925[source]
I agree with your assessment

Yes, LLMs are bad at this. A similar example: SAT solvers can't solve the pigeonhole problem without getting into a loop

It is an exceptional case that requires "metathinking" maybe, rather than a showstopper issue

(can't seem to be able to write the grammar name, the original comment from the discussion had it)

15. YeGoblynQueenne ◴[] No.42137036{3}[source]
@newmanpo Your comment is [dead] so I can't directly reply to it, but you're assuming thing about me that are wrong. I say above I'm a post-doc. You should understand what this means: I'm the workhorse in an academic research lab where I'm expected to make stuff work, and then write papers about it. I write code and tell computers when to jump. I'm not a philosopher by any stretch of the term and just to be clear, a scientist is not a philosopher (not any more).

Edit: dude, come on. That's no way to have a debate. Other times I'm the one who gets all the downvotes. You gotta soldier on through it and say your thing anyway. Robust criticism is great but being prissy about downvotes just makes HN downvote you more.

replies(1): >>42137344 #
16. bmc7505 ◴[] No.42137041[source]
FWIW, I’ve had a very similar encounter with another famous AI influencer who started lecturing me on fake automata theory that any CS undergrad would have picked up on. 140k+ followers, featured on the all the big podcasts (Lex, MLST). I never corrected him but made a mental note not to trust the guy.
17. GistNoesis ◴[] No.42137043{4}[source]
a^nb^n can definitely be expressed and recognized with a transformer.

A transformer (with relative invariant positional embedding) has full context so can see the whole sequence. It just has to count and compare.

To convince yourself, construct the weights manually.

First layer :

zeros the character which are equal to the previous character.

Second layer :

Build a feature to detect and extract the position embedding of the first a. a second feature to detect and extract the position embedding of the last a, a third feature to detect and extract the position embedding of the first b, a fourth feature to detect and extract the position embedding of the last b,

Third layer :

on top that check whether (second feature - first feature) == (fourth feature - third feature).

The paper doesn't distinguish between what is the expressive capability of the model, and the finding the optimum of the model, aka the training procedure.

If you train by only showing example with varying n, there probably isn't inductive bias to make it converge naturally towards the optimal solution you can construct by hand. But you can probably train multiple formal languages simultaneously, to make the counting feature emerge from the data.

You can't deduce much from negative results in research beside it requiring more work.

replies(1): >>42137190 #
18. Vecr ◴[] No.42137144{3}[source]
How do you do intelligence without computation though? Brains are semi-distributed analog computers with terrible interconnect speeds and latencies. Unless you think they're magic, any infinite language is still just a limit to them.

Edit: and technically you're describing what is more or less backprop learning, neural networks, by themselves, don't learn at all.

replies(1): >>42137297 #
19. YeGoblynQueenne ◴[] No.42137190{5}[source]
>> The paper doesn't distinguish between what is the expressive capability of the model, and the finding the optimum of the model, aka the training procedure.

They do. That's the whole point of the paper: you can set a bunch of weights manually like you suggest, but can you learn them instead; and how? See the Introduction. They make it very clear that they are investigating whether certain concepts can be learned by gradient descent, specifically. They point out that earlier work doesn't do that and that gradient descent is an obvious bit of bias that should affect the ability of different architectures to learn different concepts. Like I say, good work.

>> But you can probably train multiple formal languages simultaneously, to make the counting feature emerge from the data.

You could always try it out yourself, you know. Like I say that's the beauty of grammars: you can generate tons of synthetic data and go to town.

>> You can't deduce much from negative results in research beside it requiring more work.

I disagree. I'm a falsificationist. The only time we learn anything useful is when stuff fails.

replies(1): >>42139528 #
20. ◴[] No.42137215[source]
21. natch ◴[] No.42137222{3}[source]
Strong gatekeeping vibes. "Not even wrong" is perfect for this sort of fixation with labels and titles and an odd seemingly resentful take that gwern has being an AI influencer as a specific goal.
replies(4): >>42137336 #>>42137384 #>>42137739 #>>42139033 #
22. YeGoblynQueenne ◴[] No.42137274[source]
To the person that commented that five years is an awful long time to remember something like that (and then deleted their comment): you are so right. I am trying to work through this kind of thing :/
23. n2d4 ◴[] No.42137284[source]
This is such an odd comment.

In the thread you linked, Gwern says in response to someone else that NNs excel at many complex real-world tasks even if there are some tasks where they fail but humans (or other models) succeed. You try to counter that by bringing up an example for the latter type of task? And then try to argue that this proves Gwern wrong?

Whether they said "regular grammar" or "context-free grammar" doesn't even matter, the meaning of their message is still the exact same.

24. YeGoblynQueenne ◴[] No.42137297{4}[source]
Yes, I'm talking about learning neural nets with gradient descent. See also the nice paper I linked below.

>> How do you do intelligence without computation though?

Beats me! Unlike everyone else in this space, it seems, I haven't got a clue how to do intelligence at all, with or without computation.

Edit: re infinite languages, I liked something Walid Saba (RIP) pointed out on Machine Learning Street Talk, that sure you can't generate infinite strings but if you have an infinite language every string accepted by the language has a uniform probability of one over infinity, so there's no way to learn the entire language by learning the distribution of strings within it. But e.g. the Python compiler must be able to recognise an infinite number of Python programs as valid (or reject those that aren't) because of the same reason, that it's impossible to predict which string is going to come out of a source generating strings in an infinite language. So you have to able to deal with infinite possibilities, with only finite resources.

Now, I think there's a problem with that. Assuming a language L has a finite alphabet, even if L is infinite (i.e. it includes an infinite number of strings) the subset of L where strings only go up to some length n is going to be finite. If that n is large enough that it is just beyond the computational resources of any system that has to recognise strings in L (like a compiler) then any system that can recognise, or generate, all strings in L up to n length, will be, for all intents and purposes, complete with respect to L, up to n etc. In plain English, the Python compiler doesn't need to be able to deal with Python programs of infinite length, so it doesn't need to deal with an infinite number of Python programs.

Same for natural language. The informal proof of the infinity of natural language I know of is based on the observation that we can embed an arbitrary number of sentences in other sentences: "Mary, whom we met in the summer, in Fred's house, when we went there with George... " etc. But, in practice, that ability too will be limited by time and human linguistic resources, so not even the human linguistic ability really-really needs to be able to deal with an infinite number of strings.

That's assuming that natural language has a finite alphabet, or I guess lexicon is the right word. That may or may not be the case: we seem to be able to come up with new rods all the time. Anyway some of this may explain why LLMs can still convincingly reproduce the structure of natural language without having to train on infinite examples.

replies(1): >>42137493 #
25. dilap ◴[] No.42137319{3}[source]
By intellectually understand, I just mean you can ask Claude or ChatGPT or whatever, "how can I recognize if a string is in a^n b^n? what is the language being described?" and it can easily tell you; if you were giving it an exam, it would pass.

(Of course, maybe you could argue that's a famous example in its training set and it's just regurgitating, but then you could try making modifications, asking other questions, etc, and the LLM would continue to respond sensibly. So to me it seems to understand...)

Or going back to the original Hofstadter article, "simple tests show that [machine translation is] a long way from real understanding"; I tried rerunning the first two of these simple tests today w/ Claude 3.5 Sonnet (new), and it absolutely nails them. So it seems to understand the text quite well.

Regarding computation and understanding: I just though it was interesting that you presented a true fact about the computational limitations of NNs, which could easily/naturally/temptingingly -- yet incorrectly (I think!) -- be extended into a statement about the limitations of understanding of NNs (whatever understanding means -- no technical definition that I know of, but still, it does mean something, right?).

replies(1): >>42137974 #
26. ◴[] No.42137336{4}[source]
27. wyager ◴[] No.42137350[source]
It seems like his objection is that "parsing formal grammars" isn't the point of LLMs, which is fair. He was wrong about RGs vs CFGs, but I would bet that the majority of programmers are not familiar with the distinction, and learning the classification of a^nb^n is a classic homework problem in automata theory specifically because it's surprising that such a simple grammar is CF.
28. YeGoblynQueenne ◴[] No.42137384{4}[source]
OK, I concede that if I try to separate engineers from scientists it sounds like I'm trying to gatekeep. In truth, I'm organising things in my head because I started out thinking of myself as an engineer, because I like to make stuff, and at some point I started thinking of myself as a scientist, malgré moi, because I also like to know how stuff works and why. I multiclassed, you see, so I am trying to understand exactly what changed, when, and why.

I mean obviously it happened when I moved from industry to academia, but it's still the case there's a lot of overlap between the two areas, at least in CS and AI. In CS and AI the best engineers make the best scientists and vv. I think.

Btw, "gatekeeping" I think assumes that I somehow think of one category less than the other? Is that right? To be clear, I don't. I was responding to the use of both terms in the OP's comments with a personal reflection on the two categories.

replies(1): >>42137896 #
29. Vecr ◴[] No.42137493{5}[source]
What I don't know how to do is bounded rationality. Iterating over all the programs weighted by length (with dovetailing if you're a stickler) is "easy", but won't ever get anywhere.

And you can't get away with the standard tricky tricks that people use to say it isn't easy, logical induction exists.

replies(1): >>42137745 #
30. haccount ◴[] No.42137636[source]
Being an influencer requires very little actual competence, same goes for AI influencers.

The goal of influencers is to influence the segment of a crowd who cares about influencers. Meaning retards and manchildren looking for an external source to form consensus around.

31. achierius ◴[] No.42137739{4}[source]
"not even wrong" is supposed to refer to a specific category of flawed argument, but of course like many other terms it's come to really mean "low status belief"
32. YeGoblynQueenne ◴[] No.42137745{6}[source]
Right! See my (long) edit.
33. mitthrowaway2 ◴[] No.42137896{5}[source]
I sure hope nobody ever remembers you being confidently wrong about something. But if they do, hopefully that person will have the grace and self-restraint not to broadcast it any time you might make a public appearance, because they're apparently bitter that you still have any credibility.
replies(2): >>42138036 #>>42138229 #
34. YeGoblynQueenne ◴[] No.42137974{4}[source]
>> (Of course, maybe you could argue that's a famous example in its training set and it's just regurgitating, but then you could try making modifications, asking other questions, etc, and the LLM would continue to respond sensibly. So to me it seems to understand...)

Yes, well, that's the big confounder that has to be overcome by any claim of understanding (or reasoning etc) by LLMs, isn't it? They've seen so much stuff in training that it's very hard to know what they're simply reproducing from their corpus and what not. My opinion is that LLMs are statistical models of text and we can expect them to learn the surface statistical regularities of text in their corpus, which can be very powerful, but that's all. I don't see how they can learn "understanding" from text. The null hypothesis should be that they can't and, Sagan-like, we should expect to see extraordinary evidence before accepting they can. I do.

>> Regarding computation and understanding: I just though it was interesting that you presented a true fact about the computational limitations of NNs, which could easily/naturally/temptingingly -- yet incorrectly (I think!) -- be extended into a statement about the limitations of understanding of NNs (whatever understanding means -- no technical definition that I know of, but still, it does mean something, right?).

For humans it means something- because understanding is a property we assume humans have. Sometimes we use it metaphorically ("my program understands when the customer wants to change their pants") but in terms of computation... again I have no clue.

I generally have very few clues :)

replies(1): >>42141167 #
35. YeGoblynQueenne ◴[] No.42138036{6}[source]
Point taken and I warned my comment would sound vituperative. Again, the difference is that I'm not an AI influencer, and I'm not trying to make a living by claiming an expertise I don't have. I don't make "public appearances" except in conferences where I present the results of my research.

And you should see the criticism I get by other academics when I try to publish my papers and they decide I'm not even wrong. And that kind of criticism has teeth: my papers don't get published.

replies(2): >>42138124 #>>42144388 #
36. mitthrowaway2 ◴[] No.42138124{7}[source]
Please be aware that your criticism has teeth too, you just don't feel the bite of them. You say I "should see" that criticism you receive on your papers, but I don't; it's delivered in private. Unlike the review comments you get from your peers, you are writing in public. I'm sure you wouldn't appreciate it if your peer reviewer stood up after your conference keynote and told the audience that they'd rejected your paper five years ago, described your errors, and went on to say that nobody at this conference should be listening to you.
replies(1): >>42138512 #
37. YeGoblynQueenne ◴[] No.42138229{6}[source]
Can I say a bit more about criticism on the side? I've learned to embrace it as a necessary step to self-improvement.

My formative experience as a PhD student was when a senior colleague attacked my work. That was after I asked for his feedback for a paper I was writing where I showed that my system beat his system. He didn't deal with it well, sent me a furiously critical response (with obvious misunderstandings of my work) and then proceeded to tell my PhD advisor and everyone else in a conference we were attending that my work is premature and shouldn't be submitted. My advisor, trusting his ex-student (him) more than his brand new one (me), agreed and suggested I should sit on the paper a bit longer.

Later on the same colleague attacked my system again, but this time he gave me a concrete reason why: he gave me an example of a task that my system could not complete (learn a recursive logic program to return the last element in a list from a single example that is not an example of the base-case of the recursion; it's a lot harder than it may sound).

Now, I had been able to dismiss the earlier criticism as sour grapes, but this one I couldn't get over because my system really couldn't deal with it. So I tried to figure out why- where was the error I was making in my theories? Because my theoretical results said that my system should be able to learn that. Long story short, I did figure it out and I got that example to work, plus a bunch of other hard tests that people had thrown at me in the meanwhile. So I improved.

I still think my colleague's behaviour was immature and not becoming of a senior academic- attacking a PhD student because she did what you 've always done, beat your own system, is childish. In my current post-doc I just published a paper with one of our PhD students where we report his system trouncing mine (in speed; still some meat on those old bones otherwise). I think criticism is a good thing overall, if you can learn to use it to improve your work. It doesn't mean that you'll learn to like it, or that you'll be best friends with the person criticising you, it doesn't even mean that they're not out to get you; they probably are... but if the criticism is pointing out a real weakness you have, you can still use it to your advantage no matter what.

replies(1): >>42138315 #
38. mitthrowaway2 ◴[] No.42138315{7}[source]
Constructive criticism is a good thing, but in this thread you aren't speaking to Gwern directly, you're badmouthing him to his peers. I'm sure you would have felt different if your colleague had done that.
replies(1): >>42138580 #
39. YeGoblynQueenne ◴[] No.42138512{8}[source]
I think I'm addressing some of what you say in my longer comment above.

>> Please be aware that your criticism has teeth too, you just don't feel the bite of them.

Well, maybe it does. I don't know if that can be avoided. I think most people don't take criticism well. I've learned for example that there are some conversations I can't have with certain members of my extended family because they're not used to being challenged about things they don't know and they react angrily. I'm specifically remember a conversation where I was trying to explain the concept of latent hypoxia and ascent blackout [1] (I free dive recreationally) to an older family member who is an experienced scuba diver, and they not only didn't believe me, they called me an ignoramus. Because I told them something they didn't know about. Eh well.

_____________

[1] It can happen that while you're diving deep, the pressure of the water keeps the pressure of oxygen in your blood sufficient that you don't pass out, but then when you start coming up, the pressure drops and the oxygen in your blood thins out so much that you pass out. In my lay terms. My relative didn't believe that the water pressure affects the pressure of the air in your vessels. I absolutely can feel that when I'm diving- the deeper I go the easier it gets to hold my breath and it's so noticeable because it's so counter-intuitive. My relative wouldn't have experienced that during scuba diving (since they breathe pressurised air, I think) and maybe it helps he's a smoker. Anyway I never managed to convince him.

As I never managed to convince him that we eat urchins' genitals, not their eggs. After a certain point I stopped trying to convince him of anything. I mean I felt like a know-it-all anyway, even if I knew what I was talking about.

replies(1): >>42138654 #
40. YeGoblynQueenne ◴[] No.42138580{8}[source]
He did and I did feel very angry about it and it hurt our professional relationship irreparably.

But above I'm only discussing my experience of criticism as an aside, unrelated to Gwern. To be clear, my original comment was not meant as constructive criticism. Like I think my colleague was at the time, I am out to get Gwern because I think, like I say, that he is a clueless AI influencer, a cheer-leader of deep learning who is piggy-backing on the excitement about AI that he had nothing to do with creating. I wouldn't find it so annoying if he, like many others who engage in the same parasitism, did not sound so cock-sure that he knows what he's talking about.

I do not under any circumstances claim that my original comment is meant to be nice.

Btw, I remember now that Gwern has in other times accused me , here on HN, of being confidently wrong about things I don't know as well as I think I do (deep learning stuff). I think it was in a comment about Mu Zero (the DeepMind system). I don't think Gwern likes me much, either. But, then, he's a famous influencer and I'm not and I bet he finds solace in that so my criticism is not going to hurt him in the end.

41. Vecr ◴[] No.42138654{9}[source]
I actually either didn't know about that pressure thing [0], or I forgot. I suspect I read about it at some point because at some level I knew ascending could have bad effects even if you don't need decompression stops. But I didn't know why, even though it's obvious in retrospect.

So thanks for that, even though it's entirely unrelated to AI.

[0]: though I've seen videos of the exact same effect on a plastic water bottle, but my brain didn't make the connection

42. okgreatniss ◴[] No.42139033{4}[source]
It all feels like their only goal is circumlocutions over the subset of contemporary glyphs they know?

The physical principles remain regardless of how humans write them down.

43. aubanel ◴[] No.42139265{3}[source]
In my country (France), I think most last-year CS students will not have heard of it (pls anyone correct me if I'm wrong).
44. GistNoesis ◴[] No.42139528{6}[source]
Gradient descent usually get stuck in local minimum, it depends on the shape of the energy landscape, that's expected behavior.

The current wisdom is that by optimizing for multiple tasks simultaneously, it makes the energy landscape smoother. One task allow to discover features which can be used to solve other tasks.

Useful features that are used by many tasks can more easily emerge from the sea of useless features. If you don't have sufficiently many distinct tasks the signal doesn't get above the noise and is much harder to observe.

That the whole point of "Generalist" intelligence in the scaling hypothesis.

For problems where you can write a solution manually you can also help the training procedure by regularising your problem by adding the auxiliary task of predicting some custom feature. Alternatively you can "Generatively Pretrain" to obtain useful feature, replacing custom loss function by custom data.

The paper is a useful characterisation of the energy landscape of various formal tasks in isolation, but doesn't investigate the more general simpler problem that occur in practice.

45. dilap ◴[] No.42141167{5}[source]
Personally I am convinced LLMs do have real understanding, because they seem to respond in interesting and thoughtfull ways to anything I care to talk to them about, well outside of any topic I would expect to be captured statistically! (Indeed, I often find it easier to get LLMs to understand me than many humans. :-)

There's also stuff like the Golden Gate Claude experiment and research @repligate shares on twitter, which again make me think understanding (as I conceive of it) is definitely there.

Now, are the conscious, feeling entities? That is a harder question to answer...

46. ALittleLight ◴[] No.42144388{7}[source]
What is the point of saying "I warned my comment would sound vituperative"? Acknowledging a flaw in the motivation of your comment doesn't negate that flaw, it means you realize you are posting something mean spirited and consciously deciding to do it even though you recognize you're being mean spirited.