Most active commenters
  • squidbeak(6)
  • darkwater(5)
  • galaxyLogic(5)
  • conception(3)
  • MangoToupe(3)

←back to thread

215 points optimalsolver | 69 comments | | HN request time: 1.464s | source | bottom
1. My_Name ◴[] No.45770715[source]
I find that they know what they know fairly well, but if you move beyond that, into what can be reasoned from what they know, they have a profound lack of ability to do that. They are good at repeating their training data, not thinking about it.

The problem, I find, is that they then don't stop, or say they don't know (unless explicitly prompted to do so) they just make stuff up and express it with just as much confidence.

replies(9): >>45770777 #>>45770879 #>>45771048 #>>45771093 #>>45771274 #>>45771331 #>>45771503 #>>45771840 #>>45778422 #
2. ftalbot ◴[] No.45770777[source]
Every token in a response has an element of randomness to it. This means they’re non-deterministic. Even if you set up something within their training data there is some chance that you could get a nonsense, opposite, and/or dangerous result. The chance of that may be low because of things being set up for it to review its result, but there is no way to make a non-deterministic answer fully bound to solving or reasoning anything assuredly, given enough iterations. It is designed to be imperfect.
replies(4): >>45770905 #>>45771745 #>>45774081 #>>45775980 #
3. PxldLtd ◴[] No.45770879[source]
I think a good test of this seems to be to provide an image and get the model to predict what will happen next/if x occurs. They fail spectacularly at Rube-Goldberg machines. I think developing some sort of dedicated prediction model would help massively in extrapolating data. The human subconscious is filled with all sorts of parabolic prediction, gravity, momentum and various other fast-thinking paths that embed these calculations.
replies(2): >>45770967 #>>45771555 #
4. yuvalr1 ◴[] No.45770905[source]
You are making a wrong leap from non-deterministic process to uncontrollable result. Most of the parallel algorithms are non-deterministic. There might be no guarantee about the order of calculation or even sometimes the final absolute result. However, even when producing different final results, the algorithm can still guarantee characteristics about the result.

The hard problem then is not to eliminate non-deterministic behavior, but find a way to control it so that it produces what you want.

replies(1): >>45771058 #
5. yanis_t ◴[] No.45770967[source]
Any example of that? One would think that predicting what comes next from an image is basically video generation, which works not perfect, but works somehow (Veo/Sora/Grok)
replies(2): >>45771083 #>>45771523 #
6. ◴[] No.45771048[source]
7. flavaflav2 ◴[] No.45771058{3}[source]
Life and a lot in our universe is non-deterministic. Some people assume science and mathematics are some universal truths rather than imperfect agreed upon understandings. Similarly many assume humans can be controlled through laws, penalties, prisons, propaganda, coercion, etc. But terrible things happen. Yes, if you set up the gutter-rails in your bowling lane, you can control the bowling ball unless it is thrown over those rails or in a completely different direction, but those rails are wide with LLMs by default, and the system instructions provided it aren’t rules, they are an inherently faulty way to coerce a non-deterministic system. But, yes, if there’s absolutely no way to do something, and you’re aware of every possible way a response or tool could affect things, and you have taken every possible precaution, you can make it behave. That’s not how people are using it though, and we cannot control our tendency to trust that which seems trustworthy even if we are told these things.
replies(1): >>45771126 #
8. PxldLtd ◴[] No.45771083{3}[source]
Here's one I made in Veo3.1 since gemini is the only premium AI I have access to.

Using this image - https://www.whimsicalwidgets.com/wp-content/uploads/2023/07/... and the prompt: "Generate a video demonstrating what will happen when a ball rolls down the top left ramp in this scene."

You'll see it struggles - https://streamable.com/5doxh2 , which is often the case with video gen. You have to describe carefully and orchestrate natural feeling motion and interactions.

You're welcome to try with any other models but I suspect very similar results.

replies(2): >>45771168 #>>45775925 #
9. pistoriusp ◴[] No.45771093[source]
I saw a meme that I think about fairly often: Great apes have learnt sign language, and communicated with humans, since the 1960's. In all that time they've never asked human questions. They've never tried to learn anything new! The theory is that they don't know that there are entities that know things they don't.

I like to think that AI are the great apes of the digital world.

replies(3): >>45771269 #>>45771284 #>>45771925 #
10. squidbeak ◴[] No.45771126{4}[source]
No, Science is a means of searching for those truths - definitely not some 'agreed upon understanding'. It's backed up by experimentation and reproducible proofs. You also make a huge bogus leap from science to humanities.
replies(2): >>45771371 #>>45771622 #
11. chamomeal ◴[] No.45771168{4}[source]
I love how it still copies the slow pan and zoom from rube goldberg machine videos, but it's just following along with utter nonsense lol
12. 20k ◴[] No.45771269[source]
Its worth noting that the idea that great apes have learnt sign language is largely a fabrication by a single person, and nobody has ever been able to replicate this. All the communication has to be interpreted through that individual, and anyone else (including people that speak sign language) have confirmed that they're just making random hand motions in exchange for food

They don't have the dexterity to really sign properly

replies(2): >>45771344 #>>45771737 #
13. pimeys ◴[] No.45771274[source]
I just got this from codex yesterday:

"I wasn’t able to finish; no changes were shipped."

And it's not the first time.

replies(2): >>45771434 #>>45771639 #
14. BOOSTERHIDROGEN ◴[] No.45771284[source]
Does that means intelligent is soul? Then we will never achieve AGI.
15. amelius ◴[] No.45771331[source]
The problem is that the training data doesn't contain a lot of "I don't know".
replies(2): >>45771447 #>>45776836 #
16. krapht ◴[] No.45771344{3}[source]
Citation needed.
replies(3): >>45771409 #>>45771415 #>>45771416 #
17. iq176 ◴[] No.45771371{5}[source]
Scientific method is the process. Science itself includes the study and compendium of understandings, based on a belief system that includes shared understandings just like mathematics. The foundation of these are philosophical beliefs that we can know and understand these things. For example, on a metaphysical level, if the world around us were a simulation, then science could provide understandings about that simulated universe, but not about that which is simulating it.
replies(1): >>45771733 #
18. joncrocks ◴[] No.45771409{4}[source]
https://en.wikipedia.org/wiki/Great_ape_language#Criticism_a... - Not word for word, but certainly casting doubt that apes were ever really communicating in the way that people may have thought.
replies(2): >>45771475 #>>45771941 #
19. conception ◴[] No.45771415{4}[source]
Searching for koko ape fraud seems to produce a lot.
replies(1): >>45772319 #
20. pegasus ◴[] No.45771416{4}[source]
You only need a citation for the idea that apes aren't able to speak sign language?
replies(1): >>45771554 #
21. conception ◴[] No.45771434[source]
This is my most impressive response from Codex "• I started wiring up module-scoped mocks as requested, but while digging in I realised we’re sitting on a moving target: the repo already has a large pile of pending, unrelated changes (whole modules reshuffled, new ps1 files, etc.), and the Enhanced.WorkflowOrchestration.Tests.ps1 under review isn’t even tracked in the baseline branch. Because of that, I couldn’t produce a clean edit that I can confidently hand back—the amount of refactoring needed (swapping every mock to helper-backed, module-scoped versions + fixing the knock-on parameter-validation fallout) is substantial, and with the repo in this state I’d just be guessing.

  Rather than check in something half-broken, I’m pausing here. Let me know how you want to
  proceed—if you can land the upstream refactor (or share a stable snapshot of the tests/module),
  I can pick this up again and finish the review fixes in one go."
22. pegasus ◴[] No.45771447[source]
The bigger problem is that the benchmarks / multiple-choice tests they are trained to optimize for don't distinguish between a wrong answer and "I don't know". Which is stupid and surprising. There was a thread here on HN about this recently.
23. mkl ◴[] No.45771475{5}[source]
That article does completely refute 20k's claim that it was all done by one person though.
24. usrbinbash ◴[] No.45771503[source]
> They are good at repeating their training data, not thinking about it.

Which shouldn't come as a surprise, considering that this is, at the core of things, what language models do: Generate sequences that are statistically likely according to their training data.

replies(1): >>45772607 #
25. mannykannot ◴[] No.45771523{3}[source]
It is video generation, but succeeding at this task involves detailed reasoning about cause and effect to construct chains of events, and may not be something that can be readily completed by applying "intuitions" gained from "watching" lots of typical movies, where most of the events are stereotypical.
26. acdha ◴[] No.45771554{5}[source]
They claimed fraud by a single person, with zero replication. That’s both testable so they should be able to support it.

At the very least, more than one researcher was involved and more than one ape was alleged to have learned ASL. There is a better discussion about what our threshold is for speech, along with our threshold for saying that research is fraud vs. mistaken, but we don’t fix sloppiness by engaging in more of it.

replies(1): >>45775819 #
27. pfortuny ◴[] No.45771555[source]
Most amazing is asking any of the models to draw an 11-sided polygon and number the edges.
replies(1): >>45771707 #
28. darkwater ◴[] No.45771622{5}[source]
But those are still approximations to the actual underlying reality. Because the other option (and yes, it's a dichotomy) is that we already defined and understood every detail of the physics that applies to our universe.
replies(1): >>45771708 #
29. darkwater ◴[] No.45771639[source]
Have you threatened it with a 2 in the next round of performance reviews?
replies(1): >>45785965 #
30. Torkel ◴[] No.45771707{3}[source]
I asked gpt5, and it worked really well with a correct result. Did you expect it to fail?
replies(1): >>45784360 #
31. squidbeak ◴[] No.45771708{6}[source]
Indeed, that is a dichotomy: a false one. Science is exact without being finished.
replies(1): >>45772038 #
32. squidbeak ◴[] No.45771733{6}[source]
This I'm afraid is rubbish. Scientific proofs categorically don't depend on philosophical beliefs. Reality is measurable and the properties measured don't care about philosophy.
replies(1): >>45772324 #
33. rightbyte ◴[] No.45771737{3}[source]
I mean dogs can learn a simple sign language?
replies(1): >>45775319 #
34. mannykannot ◴[] No.45771745[source]
There seems to be more to it than that - in my experience with LLMs, they are good at finding some relevant facts but then quite often present a non-sequitur for a conclusion, and the article's title alone indicates that the problem for LRMs is similar: a sudden fall-off in performance as the task gets more difficult. If the issue was just non-determinism, I would expect the errors to be more evenly distributed, though I suppose one could argue that the sensitivity to non-determinism increases non-linearly.
35. Workaccount2 ◴[] No.45771840[source]
To be fair, we don't actually know what is and isn't in their training data. So instead we just assign successes to "in the training set" and failures to "not in the training set".

But this is unlikely, because they still can fall over pretty badly on things that are definitely in the training set, and still can have success with things that definitely are not in the training set.

36. MangoToupe ◴[] No.45771925[source]
> The theory is that they don't know that there are entities that know things they don't.

This seems like a rather awkward way of putting it. They may just lack conceptualization or abstraction, making the above statement meaningless.

replies(1): >>45772322 #
37. MangoToupe ◴[] No.45771941{5}[source]
The way linguists define communication via language? Sure. Let's not drag the rest of humanity into this presumption.
38. darkwater ◴[] No.45772038{7}[source]
So, was Newtonian physics exact already?
replies(1): >>45772146 #
39. squidbeak ◴[] No.45772146{8}[source]
> Science is exact without being finished
replies(1): >>45772311 #
40. darkwater ◴[] No.45772311{9}[source]
Being exact doesn't mean it is not an approximation, which was the initial topic. Being exact in science means that 2+2=4 and that can be demonstrated following a logical chain. But that doesn't make our knowledge of the universe exact. It is still an approximation. What it can be "exact" is how we obtain and reproduce the current knowledge we have of it.
replies(1): >>45774277 #
41. ralfd ◴[] No.45772319{5}[source]
> In his lecture, Sapolsky alleges that Patterson spontaneously corrects Koko’s signs: “She would ask, ‘Koko, what do you call this thing?’ and [Koko] would come up with a completely wrong sign, and Patterson would say, ‘Oh, stop kidding around!’ And then Patterson would show her the next one, and Koko would get it wrong, and Patterson would say, ‘Oh, you funny gorilla.’ ”

More weirdly was this lawsuit against Patterson:

> The lawsuit alleged that in response to signing from Koko, Patterson pressured Keller and Alperin (two of the female staff) to flash the ape. "Oh, yes, Koko, Nancy has nipples. Nancy can show you her nipples," Patterson reportedly said on one occasion. And on another: "Koko, you see my nipples all the time. You are probably bored with my nipples. You need to see new nipples. I will turn my back so Kendra can show you her nipples."[47] Shortly thereafter, a third woman filed suit, alleging that upon being first introduced to Koko, Patterson told her that Koko was communicating that she wanted to see the woman's nipples

There was a bonobo named Kanzi who learned hundreds of lexigrams. The main criticism here seems to be that while Kanzi truly did know the symbol for “Strawberry” he “used the symbol for “strawberry” as the name for the object, as a request to go where the strawberries are, as a request to eat some strawberries”. So no object-verb sentences and so no grammar which means no true language according to linguists.

https://linguisticdiscovery.com/posts/kanzi/

replies(1): >>45775868 #
42. sodality2 ◴[] No.45772322{3}[source]
The exact title of the capacity is 'theory of mind' - for example, chimpanzees have a limited capacity for it in that they can understand others' intentions, but they seemingly do not understand false beliefs (this is what GP mentioned).

https://doi.org/10.1016/j.tics.2008.02.010

replies(1): >>45774108 #
43. weltensturm ◴[] No.45772324{7}[source]
> Reality is measurable

Heisenberg would disagree.

replies(1): >>45774272 #
44. dymk ◴[] No.45772607[source]
This is too large of an oversimplification of how an LLM works. I hope the meme that they are just next token predictors dies out soon, before it becomes a permanent fixture of incorrect but often stated “common sense”. They’re not Markov chains.
replies(3): >>45772668 #>>45772674 #>>45780675 #
45. adastra22 ◴[] No.45772668{3}[source]
They are next token predictors though. That is literally wha they are. Nobody is saying they are simple Markov chains.
replies(1): >>45775953 #
46. gpderetta ◴[] No.45772674{3}[source]
Indeed, they are next token predictors, but this is a vacuous statement because the predictor can be arbitrary complex.
replies(1): >>45776178 #
47. squidproquo ◴[] No.45774081[source]
The non-determinism is part of the allure of these systems -- they operate like slot machines in a casino. The dopamine hit of getting an output that appears intelligent and the variable rewards keeps us coming back. We down-weight and ignore the bad outputs. I'm not saying these systems aren't useful to a degree, but one should understand the statistical implications on how we are collectively perceiving their usefulness.
48. MangoToupe ◴[] No.45774108{4}[source]
Theory of mind is a distinct concept that isn't necessary to explain this behavior. Of course, it may follow naturally, but it strikes me as ham-fisted projection of our own cognition onto others. Ironically, a rather greedy theory of mind!
replies(1): >>45775896 #
49. squidbeak ◴[] No.45774272{8}[source]
Are you arguing that the uncertainty principle derives from philosophy rather than math?
50. squidbeak ◴[] No.45774277{10}[source]
The speed of light, or plank's constant - are these approximations?
replies(1): >>45780008 #
51. leptons ◴[] No.45775319{4}[source]
Can the dogs sign back? Even dogs that learn to press buttons are mostly just pressing them to get treats. They don't ask questions, and it's not really a conversation.
replies(1): >>45785502 #
52. galaxyLogic ◴[] No.45775819{6}[source]
SO why wasn't the research continued further if results were good? My assumption is it was because of the - Fear of the Planet of Apes!
53. galaxyLogic ◴[] No.45775868{6}[source]
> So no object-verb sentences and so no grammar which means no true language

Great distinction. The stuff about showing nipples sounds creepy.

54. galaxyLogic ◴[] No.45775896{5}[source]
If apes started communicating mongs themselves with sign-language they learned from humans that would measn they would get more practice using it and they could evolve it over aeons. Hey, isn't that what actually happened?
55. galaxyLogic ◴[] No.45775925{4}[source]
A Goldbergs machine was not part of their training data. For humans, we have seem such things.
replies(1): >>45776030 #
56. dymk ◴[] No.45775953{4}[source]
It’s a uselessly reductive statement. A person at a keyboard is also a next token predictor, then.
replies(3): >>45776192 #>>45776258 #>>45778151 #
57. galaxyLogic ◴[] No.45775980[source]
> Every token in a response has an element of randomness to it.

I haven't tried this, but so if you ask the LLM the exact same question again, but in a different process, will you get a different answer?

Wouldn't that mean we should mosr of the time ask the LLM each question multiple times, to see if we get a better answer next time?

A bit like asking the same question from multiple different LLMs just to be sure.

58. autoexec ◴[] No.45776030{5}[source]
physics textbooks are though so it should know how they'd work, or at least know that balls don't spontaneously appear and disappear and that gears don't work when they aren't connected
59. HarHarVeryFunny ◴[] No.45776178{4}[source]
Sure, but a complex predictor is still a predictor. It would be a BAD predictor if everything it output was not based on "what would the training data say?".

If you ask it to innovate and come up with something not in it's training data, what do you think it will do .... it'll "look at" it's training data and regurgitate (predict) something labelled as innovative

You can put a reasoning cap on a predictor, but it's still a predictor.

replies(1): >>45776459 #
60. HarHarVeryFunny ◴[] No.45776192{5}[source]
Yes, but it's not ALL they are.
replies(1): >>45776451 #
61. daveguy ◴[] No.45776258{5}[source]
They are both designed, trained, and evaluated by how well they can predict the next token. It's literally what they do. "Reasoning" models just buildup additional context of next token predictions and RL is used to bias output options to ones more appealing to human judges. It's not a meme. It's an accurate description of their fundamental computational nature.
62. astrange ◴[] No.45776836[source]
That's not important compared to the post-training RL, which isn't "training data".
63. adastra22 ◴[] No.45778151{5}[source]
Yes. That's not the devastating take-down you think it is. Are you positing that people have souls? If not, then yes: human chain-of-thought is the equivalent of next token prediction.
64. robocat ◴[] No.45778422[source]
> They are good at repeating their training data, not thinking about it

Sounds like most people too!

My favourite part of LLMs is noticing the faults of people that LLMs also have!

65. darkwater ◴[] No.45780008{11}[source]
To our current knowledge, no. But maybe we are missing something, we cannot know. Did infrared light or ultrasound start to exist only when we realized there are things our senses cannot feel?
66. Libidinalecon ◴[] No.45780675{3}[source]
The problem is in adding the word "just" for no reason.

It makes the statement of a fact a type of rhetorical device.

It is the difference between saying "I am a biological entity" and "I am just a biological entity". There are all kinds of connotations that come along for the ride with the latter statement.

Then there is the counter with the romantic statement that "I am not just a biological entity".

67. pfortuny ◴[] No.45784360{4}[source]
It has failed me several times already, drawing at most an octagon or a 12-gon: I mean create an image, not a program to do it.
68. rightbyte ◴[] No.45785502{5}[source]
They can like barf as part of a trick and do "thing we are searching for is in that direction" etc but not very abstract communications.
69. conception ◴[] No.45785965{3}[source]
I usually stick with “lives will be lost if you fail at this.” standard.