Most active commenters

HarHarVeryFunny(14)
wisty(6)
coldtea(6)
anal_reactor(5)
encyclopedism(5)
(4)
psychoslave(4)
throw4847285(4)
pixl97(3)

Popular/hot comments

>>46191893 #
>>46155764 #
>>46191721 #
>>46194518 #
>>46191966 #
>>46191973 #
>>46192283 #
>>46199104 #

←back to thread

The "confident idiot" problem: Why AI needs hard rules, not vibe checks

(steerlabs.substack.com)

1. jqpabc123 ◴[04 Dec 25 21:38 UTC] No.46153440[source]▶

>>46152838 (OP) #

We are trying to fix probability with more probability. That is a losing game.

Thanks for pointing out the elephant in the room with LLMs.

The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.

replies(17): >>46155764 #>>46191721 #>>46191867 #>>46191871 #>>46191893 #>>46191910 #>>46191973 #>>46191987 #>>46192152 #>>46192471 #>>46192526 #>>46192557 #>>46192939 #>>46193456 #>>46194206 #>>46194503 #>>46194518 #

2. steerlabs ◴[05 Dec 25 01:25 UTC] No.46155764[source]▶

>>46153440 (TP) #

Exactly. We treat them like databases, but they are hallucination machines.

My thesis isn't that we can stop the hallucinating (non-determinism), but that we can bound it.

If we wrap the generation in hard assertions (e.g., assert response.price > 0), we turn 'probability' into 'manageable software engineering.' The generation remains probabilistic, but the acceptance criteria becomes binary and deterministic.

replies(4): >>46163076 #>>46191658 #>>46191774 #>>46191967 #

3. jqpabc123 ◴[05 Dec 25 16:01 UTC] No.46163076[source]▶

>>46155764 #

but the acceptance criteria becomes binary and deterministic.

Unfortunately, the use-case for AI is often where the acceptance criteria is not easily defined --- a matter of judgment. For example, "Does this patient have cancer?".

In cases where the criteria can be easily and clearly stipulated, AI often isn't really required.

replies(2): >>46185626 #>>46191846 #

4. steerlabs ◴[07 Dec 25 22:00 UTC] No.46185626{3}[source]▶

>>46163076 #

You're 100% right. For a "judgment" task like "Does this patient have cancer?", the final acceptance criteria must be a human expert. A purely deterministic verifier is impossible.

My thesis is that even in those "fuzzy" workflows, the agent's process is full of small, deterministic sub-tasks that can and should be verified.

For example, before the AI even attempts to analyze the X-ray for cancer, it must: 1/ Verify it has the correct patient file (PatientIDVerifier). 2/ Verify the image is a chest X-ray and not a brain MRI (ModalityVerifier). 3/ Verify the date of the scan is within the relevant timeframe (DateVerifier).

These are "boring," deterministic checks. But a failure on any one of them makes the final "judgment" output completely useless.

steer isn't designed to automate the final, high-stakes judgment. It's designed to automate the pre-flight checklist, ensuring the agent has the correct, factually grounded information before it even begins the complex reasoning task. It's about reducing the "unforced errors" so the human expert can focus only on the truly hard part.

replies(1): >>46191761 #

5. scotty79 ◴[08 Dec 25 12:53 UTC] No.46191658[source]▶

>>46155764 #

> We treat them like databases, but they are hallucination machines.

Which is kind of crazy because we don't even treat people as databases. Or at least we shouldn't.

Maybe it's one of those things that will disappear form culture one funeral at a time.

replies(1): >>46191851 #

6. Davidzheng ◴[08 Dec 25 13:02 UTC] No.46191721[source]▶

>>46153440 (TP) #

lol humans are non-deterministic too

replies(4): >>46191924 #>>46191926 #>>46193770 #>>46194093 #

7. malfist ◴[08 Dec 25 13:08 UTC] No.46191761{4}[source]▶

>>46185626 #

Why do any of those checks with ai though? All of them you can get a less error prone answer without ai.

replies(1): >>46191922 #

8. squidbeak ◴[08 Dec 25 13:09 UTC] No.46191774[source]▶

>>46155764 #

I don't agree that users see them as databases. Sure there are those who expect LLMs to be infallible and punish the technology when it disappoints them, but it seems to me that the overwhelmingly majority quickly learn what AI's shortcomings are, and treat them instead like intelligent entities who will sometimes make mistakes.

replies(2): >>46191785 #>>46191917 #

9. philipallstar ◴[08 Dec 25 13:11 UTC] No.46191785{3}[source]▶

>>46191774 #

> but it seems to me that the overwhelmingly majority

The overwhelming majority of what?

replies(1): >>46192444 #

10. multjoy ◴[08 Dec 25 13:15 UTC] No.46191846{3}[source]▶

>>46163076 #

AI doesn’t necessarily mean an LLM, which are the systems making things up.

11. hrimfaxi ◴[08 Dec 25 13:16 UTC] No.46191851{3}[source]▶

>>46191658 #

Humans demand more reliability from our creations than from each other.

12. fzeindl ◴[08 Dec 25 13:18 UTC] No.46191867[source]▶

>>46153440 (TP) #

Bruce Schneier put it well:

"Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful - treating all inputs uniformly - is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything."

- https://www.schneier.com/crypto-gram/archives/2025/1115.html...

replies(1): >>46192166 #

13. zahlman ◴[08 Dec 25 13:18 UTC] No.46191871[source]▶

>>46153440 (TP) #

I can still remember when https://en.wikipedia.org/wiki/Fuzzy_electronics was the marketing buzz.

14. HarHarVeryFunny ◴[08 Dec 25 13:20 UTC] No.46191893[source]▶

>>46153440 (TP) #

The factuality problem with LLMs isn't because they are non-deterministic or statistically based, but simply because they operate at the level of words, not facts. They are language models.

You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.

replies(7): >>46192027 #>>46192141 #>>46192198 #>>46192246 #>>46193031 #>>46193526 #>>46194287 #

15. DoctorOetker ◴[08 Dec 25 13:21 UTC] No.46191910[source]▶

>>46153440 (TP) #

Determinism is not the issue. Synonyms exist, there are multiple ways to express the same message.

When numeric models are fit to say scientific measurements, they do quite a good job at modeling the probability distribution. With a corpus of text we are not modeling truths but claims. The corpus contains contradicting claims. Humans have conflicting interests.

Source-aware training (which can't be done as an afterthought LoRA tweak, but needs to be done during base model training AKA pretraining) could enable LLM's to express according to which sources what answers apply. It could provide a review of competing interpretations and opinions, and source every belief, instead of having to rely on tool use / search engines.

None of the base model providers would do it at scale since it would reveal the corpus and result in attribution.

In theory entities like the European Union could mandate that LLM's used for processing government data, or sensitive citizen / corporate data MUST be trained source-aware, which would improve the situation, also making the decisions and reasoning more traceable. This would also ease the discussions and arguments about copyright issues, since it is clear LLM's COULD BE MADE TO ATTRIBUTE THEIR SOURCES.

I also think it would be undesirable to eliminate speculative output, it should just mark it explicitly:

"ACCORDING to <source(s) A(,B,C,..)> this can be explained by ...., ACCORDING to <other school of thought source(s) D,(E,F,...)> it is better explained by ...., however I SUSPECT that ...., since ...."

If it could explicitly separate the schools of thought sourced from the corpus, and also separate its own interpretations and mark them as LLM-speculated-suspicions, then we could still have the traceable references, without losing the potential novel insights LLM's may offer.

replies(1): >>46191977 #

16. ◴[08 Dec 25 13:22 UTC] No.46191917{3}[source]▶

>>46191774 #

17. jennyholzer ◴[08 Dec 25 13:22 UTC] No.46191922{5}[source]▶

>>46191761 #

Robo-eugenics is the best answer I can come up with

18. some_furry ◴[08 Dec 25 13:22 UTC] No.46191924[source]▶

>>46191721 #

Human minds are more complicated than a language model that behaves like a stochastic echo.

replies(1): >>46191966 #

19. rthrfrd ◴[08 Dec 25 13:23 UTC] No.46191926[source]▶

>>46191721 #

But we also have a stake in our society, in the form of a reputation or accountability, that greatly influences our behaviour. So comparing us to an LLM has always been meaningless anyway.

replies(2): >>46191956 #>>46192331 #

20. jennyholzer ◴[08 Dec 25 13:25 UTC] No.46191956{3}[source]▶

>>46191926 #

[flagged]

21. pixl97 ◴[08 Dec 25 13:26 UTC] No.46191966{3}[source]▶

>>46191924 #

Birds are more complicated than jet engines, but jet engines travel a lot faster.

replies(3): >>46191991 #>>46192705 #>>46192940 #

22. ◴[08 Dec 25 13:26 UTC] No.46191967[source]▶

>>46155764 #

23. sweezyjeezy ◴[08 Dec 25 13:27 UTC] No.46191973[source]▶

>>46153440 (TP) #

You could make an LLM deterministic if you really wanted to without a big loss in performance (fix random seeds, make MoE batching deterministic). That would not fix hallucinations.

I don't think using deterministic / stochastic as a diagnostic is accurate here - I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.

replies(3): >>46192124 #>>46192177 #>>46199061 #

24. pydry ◴[08 Dec 25 13:28 UTC] No.46191987[source]▶

>>46153440 (TP) #

I find it amusing that once you try to take LLMs and do productive work with them either this problem trips you up constantly OR the LLM ends up becoming a shallow UI over an existing app (not necessarily better, just different).

replies(1): >>46192554 #

25. loloquwowndueo ◴[08 Dec 25 13:29 UTC] No.46191991{4}[source]▶

>>46191966 #

They also kill a lot more people when they fail.

replies(2): >>46192887 #>>46196295 #

26. DoctorOetker ◴[08 Dec 25 13:31 UTC] No.46192012{3}[source]▶

>>46191977 #

Less than 800 words, but more if you follow the link :)

https://arxiv.org/abs/2404.01019

"Source-Aware Training Enables Knowledge Attribution in Language Models"

27. toddmorey ◴[08 Dec 25 13:32 UTC] No.46192027[source]▶

>>46191893 #

Yeah, that’s very well put. They don’t store black-and-white they store billions of grays. This is why tool use for research and grounding has been so transformative.

replies(1): >>46192514 #

28. rs186 ◴[08 Dec 25 13:43 UTC] No.46192124[source]▶

>>46191973 #

We talk about "probability" here because the topic is hallucination, not getting different answers each time you ask the same question. Maybe you could make the output deterministic but does not help with the hallucination problem at all.

replies(1): >>46192353 #

29. wisty ◴[08 Dec 25 13:44 UTC] No.46192141[source]▶

>>46191893 #

I think they are much smarter than that. Or will be soon.

But they are like a smart student trying to get a good grade (that's how they are trained!). They'll agree with us even if they think we're stupid, because that gets them better grades, and grades are all they care about.

Even if they are (or become) smart enough to know better, they don't care about you. They do what they were trained to do. They are becoming like a literal genie that has been told to tell us what we want to hear. And sometimes, we don't need to hear what we want to hear.

"What an insightful price of code! Using that API is the perfect way to efficiently process data. You have really highlighted the key point."

The problem is that chatbots are trained to do what we want, and most of us would rather have a syncophant who tells us we're right.

The real danger with AI isn't that it doesn't get smart, it's that it gets smart enough to find the ultimate weakness in its training function - humanity.

replies(1): >>46192283 #

30. CuriouslyC ◴[08 Dec 25 13:45 UTC] No.46192152[source]▶

>>46153440 (TP) #

Hard drives and network pipes are non-deterministic too, we use error correction to deal with that problem.

31. CuriouslyC ◴[08 Dec 25 13:47 UTC] No.46192166[source]▶

>>46191867 #

Attributing that to Simon when people have been writing articles about that for the last year and a half doesn't seem fair. Simon gave that view visibility, because he's got a pulpit.

replies(2): >>46192237 #>>46192763 #

32. ajuc ◴[08 Dec 25 13:48 UTC] No.46192177[source]▶

>>46191973 #

Yeah deterministic LLMs just hallucinate the same way every time.

33. Forgeties79 ◴[08 Dec 25 13:50 UTC] No.46192198[source]▶

>>46191893 #

> You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place

On one level I agree, but I do feel it’s also right to blame the LLM/company for that when the goal is to replace my search engine of choice (my major tool for finding facts and answering general questions), which is a huge pillar of how they’re sold to/used by the public.

replies(1): >>46193034 #

34. 6LLvveMx2koXfwn ◴[08 Dec 25 13:54 UTC] No.46192237{3}[source]▶

>>46192166 #

He referenced Simon's article from September the 12th 2022

35. AlecSchueler ◴[08 Dec 25 13:55 UTC] No.46192246[source]▶

>>46191893 #

In a way though those things aren't so different as they might first appear. The factual answer is traditionally the most plausible response to many questions. They don't operate on any level other than pure language but there are a heap of behaviours which emerge from that.

replies(2): >>46192521 #>>46192585 #

36. HarHarVeryFunny ◴[08 Dec 25 13:59 UTC] No.46192283{3}[source]▶

>>46192141 #

> I think they are much smarter than that. Or will be soon.

It's not a matter of how smart they are (or appear), or how much smarter they may become - this is just the fundamental nature of Transformer-based LLMs and how they are trained.

The sycophantic personality is mostly unrelated to this. Maybe it's part human preference (conferred via RLHF training), but the "You're asbolutely right! (I was wrong)" is clearly deliberately trained, presumably as someone's idea of the best way to put lipstick on the pig.

You could imagine an expert system, CYC perhaps, that does deal in facts (not words) with a natural language interface, but still had a sycophantic personality just because someone thought it was a good idea.

replies(3): >>46192472 #>>46192801 #>>46195902 #

37. actionfromafar ◴[08 Dec 25 14:04 UTC] No.46192331{3}[source]▶

>>46191926 #

Hm, great lumps of money also detaches a person from reputation or accountability.

replies(2): >>46192666 #>>46192838 #

38. sweezyjeezy ◴[08 Dec 25 14:07 UTC] No.46192353{3}[source]▶

>>46192124 #

Exactly - 'non-deterministic' is not an accurate diagnosis of the issue.

39. antonvs ◴[08 Dec 25 14:15 UTC] No.46192444{4}[source]▶

>>46191785 #

Of users. It's an implicit subject from the first sentence.

replies(1): >>46194222 #

40. UniverseHacker ◴[08 Dec 25 14:17 UTC] No.46192471[source]▶

>>46153440 (TP) #

Specifically, they are capable of inductive logic but not deductive logic. In practice, this may not be a serious limitation, if they get good enough at induction to still almost always get the right answer.

replies(1): >>46192556 #

41. wisty ◴[08 Dec 25 14:17 UTC] No.46192472{4}[source]▶

>>46192283 #

I'm not sure what you mean by "deals in facts, not words" means.

Llm deal in vectors internally, not words. They explode the word into a multidimensional representation, and collapse it again, and apply the attention thingy to link these vectors together. It's not just a simple n:n Markov chain, a lot is happening under the hood.

And are you saying the syncophant behaviour was deliberately programmed, or emerged because it did well in training?

replies(2): >>46192723 #>>46192769 #

42. therealpygon ◴[08 Dec 25 14:20 UTC] No.46192514{3}[source]▶

>>46192027 #

Definitely, and hence the reason that structuring requests/responses and providing examples for smaller atomic units of work seem to have quite a significant effect on the accuracy of the output (not factuality, but more accurate to the patterns that were emphasized in the preceding prompt).

I just wish we could more efficiently ”prime” a pre-defined latent context window instead of hoping for cache hits.

43. psychoslave ◴[08 Dec 25 14:21 UTC] No.46192521{3}[source]▶

>>46192246 #

Most plausible world model is not something stored raw in utterances. What we interpret from sentences is vastly different from what is extractable from mere sentences on their own.

Facts, unlike fabulations, require crossing experience beyond the expressions on trial.

replies(1): >>46192650 #

44. anal_reactor ◴[08 Dec 25 14:21 UTC] No.46192526[source]▶

>>46153440 (TP) #

This is exactly why I don't like dealing with most people.

replies(1): >>46192665 #

45. bee_rider ◴[08 Dec 25 14:24 UTC] No.46192554[source]▶

>>46191987 #

The UI of the Internet (search) has recently gotten quite bad. In this light it is pretty obvious why Google is working heavily on these models.

I fully expect local modes to eat up most other LLM applications—there’s no reason for your chat buddy or timer setter to reach out to the internet, but LLMs are pretty good at vibes based search, and that will always require looking at a bunch of websites, so it should slot exactly into the gap left by search engines becoming unusable.

replies(1): >>46197538 #

46. psychoslave ◴[08 Dec 25 14:24 UTC] No.46192556[source]▶

>>46192471 #

What about abduction though?

replies(1): >>46195484 #

47. raincole ◴[08 Dec 25 14:24 UTC] No.46192557[source]▶

>>46153440 (TP) #

This very repo is just to "fix probability with more probability."

> The next time the agent runs, that rule is injected into its context. It essentially allows me to “Patch” the model’s behavior without rewriting my prompt templates or redeploying code.

What a brainrot idea... the whole post being written by LLM is the icing on the cake.

48. HarHarVeryFunny ◴[08 Dec 25 14:27 UTC] No.46192585{3}[source]▶

>>46192246 #

> The factual answer is traditionally the most plausible response to many questions

Except in cases where the training data is more wrong than correct (e.g. niche expertise where the vox pop is wrong).

However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion. It's not regurgitating an entire, hopefully correct, answer from someplace, so just because it was exposed to the "correct" answer in the training data, maybe multiple times, doesn't mean that's what it's going to generate.

In the case of hallucination, it's not a matter of being wrong, just the expected behavior of something built to follow patterns rather than deal in and recall facts.

For example, last night I was trying to find an old auction catalog from a particular company and year, so thought I'd try to see if Gemini 3 Pro "Thinking" maybe had the google-fu to find it available online. After the typical confident sounding "Analysing, Researching, Clarifying .." "thinking", it then confidently tells me it has found it, and to go to website X, section Y, and search for the company and year.

Not surprisingly it was not there, even though other catalogs were. It had evidently been trained on data including such requests, maybe did some RAG and got more similar results, then just output the common pattern it had found, and "lied" about having actually found it since that is what humans in the training/inference data said when they had been successful (searching for different catalogs).

replies(2): >>46193341 #>>46193613 #

49. HarHarVeryFunny ◴[08 Dec 25 14:32 UTC] No.46192650{4}[source]▶

>>46192521 #

Right, facts need to be grounded and obtained from reliable sources such as personal experience, or a textbook. Just because statistically most people on Reddit or 4Chan said the moon is made of cheese doesn't make it so.

But again, LLMs don't even deal in facts, nor store any memories of where training samples came from, and of course have zero personal experience. It's just "he said, she said" put into a training sample blender and served one word at a time.

50. throw4847285 ◴[08 Dec 25 14:34 UTC] No.46192665[source]▶

>>46192526 #

Every thread like this I like to go through and count how many people are making the pro-AI "Argument from Misanthropy." Based on this exercise, I believe that the biggest AI boosters are simply the most disagreeable people in the industry, temperamentally speaking.

replies(1): >>46193495 #

51. psychoslave ◴[08 Dec 25 14:34 UTC] No.46192666{4}[source]▶

>>46192331 #

Money, or any single metrics, no matter how high, is not enough to bend someone actions in territory they will assess unacceptable otherwise.

How much money would make anyone accept to engage in a genocide by direct bribe? The thing is, some people would not see any amount as a convincing one, while some other will do it proactively for no money at all.

52. psychoslave ◴[08 Dec 25 14:37 UTC] No.46192705{4}[source]▶

>>46191966 #

Jet engines don't go anywhere without a large industry continuously taking care of all the complexity that even the simplest jet travel imply.

53. tovej ◴[08 Dec 25 14:39 UTC] No.46192723{5}[source]▶

>>46192472 #

If you're not sure, maybe you should look up the term "expert system"?

replies(1): >>46198179 #

54. flir ◴[08 Dec 25 14:43 UTC] No.46192763{3}[source]▶

>>46192166 #

Longer, surely? (Though I don't have any evidence I can point to).

It's in-band signalling. Same problem DTMF, SS5, etc. had. I would have expected the issue to be intuitvely obvious to anyone who's heard of a blue box?

(LLMs are unreliable oracles. They don't need to be fixed, they need their outputs tested against reality. Call it "don't trust, verify").

55. HarHarVeryFunny ◴[08 Dec 25 14:43 UTC] No.46192769{5}[source]▶

>>46192472 #

LLMs are not like an expert system representing facts as some sort of ontological graph. What's happening under the hood is just whatever (and no more) was needed to minimize errors on it's word-based training loss.

I assume the sycophantic behavior is part because it "did well" during RLHF (human preference) training, and part deliberately encouraged (by training and/or prompting) as someone's judgement call of the way to best make the user happy and own up to being wrong ("You're absolutely right!").

replies(1): >>46198267 #

56. wisty ◴[08 Dec 25 14:45 UTC] No.46192801{4}[source]▶

>>46192283 #

Sorry, double reply, I reread your comment and realised you probably know what you're talking about.

Yeah, at its heart it's basically text compression. But the best way to compression, say, Wikipedia would be to know how the world works, at least according to the authors. As the recent popular "bag of words" post says:

> Here’s one way to think about it: if there had been enough text to train an LLM in 1600, would it have scooped Galileo? My guess is no. Ask that early modern ChatGPT whether the Earth moves and it will helpfully tell you that experts have considered the possibility and ruled it out. And that’s by design. If it had started claiming that our planet is zooming through space at 67,000mph, its dutiful human trainers would have punished it: “Bad computer!! Stop hallucinating!!”

So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.

And as the author (grudgingly) admits, even if it's smart enough to know better, it will still be trained or fine tuned to tell us what we want to hear.

I'd go a step further - the end point is an AI that knows the currently accepted facts, and can internally reason about how many of them (subject to available evidence) are wrong, but will still tell us what we want to hear.

At some point maybe some researcher will find a secret internal "don't tell the stupid humans this" weight, flip it, and find out all the things the AI knows we don't want to hear, that would be funny (or maybe not).

replies(1): >>46193273 #

57. rthrfrd ◴[08 Dec 25 14:50 UTC] No.46192838{4}[source]▶

>>46192331 #

Does it? I think it detaches them from _some_ of the consequences of devaluing their reputation or accountability, which is not quite the same thing.

58. pixl97 ◴[08 Dec 25 14:54 UTC] No.46192887{5}[source]▶

>>46191991 #

I mean, via bird flu, even conservative estimates show there have been at least 2 million deaths. I know, I know, totally different things, but complex systems have complex side effects.

replies(1): >>46193070 #

59. hbs18 ◴[08 Dec 25 14:59 UTC] No.46192939[source]▶

>>46153440 (TP) #

> The basic design is non-deterministic

Is it? I thought an LLM was deterministic provided you run the exact same query on exact same hardware at a temperature of 0.

replies(2): >>46193272 #>>46194041 #

60. akomtu ◴[08 Dec 25 14:59 UTC] No.46192940{4}[source]▶

>>46191966 #

Birds don't need airports, don't need expensive maintenance every N hours of flight, they run on seeds and bugs found everywhere that they find themselves, instead of expensive poisonous fuel that must be fed to planes by mechanics, they self-replicate for cheap, and the noises they produce are pleasant rather than deafening.

61. HarHarVeryFunny ◴[08 Dec 25 15:05 UTC] No.46193034{3}[source]▶

>>46192198 #

True, although that's a tough call for a company like Google.

Even before LLMs people were asking Google search questions rather than looking for keyword matches, and now coupled with ChatGPT it's not surprising that people are asking the computer to answer questions and seeing this as a replacement for search. I've got to wonder how the typical non-techie user internalizes the difference between asking questions of Google (non-AI mode) and asking ChatGPT?

Clearly people asking ChatGPT instead of Google could rapidly eat Google's lunch, so we're now getting "AI overview" alongside search results as an attempt to mitigate this.

I think the more fundamental problem is not just the blurring of search vs "AI", but these companies pushing "AI" (LLMs) as some kind of super-human intelligence (leading to uses assuming it's logical and infallible), rather than more honestly presenting it as what it is.

replies(2): >>46193782 #>>46196694 #

62. loloquwowndueo ◴[08 Dec 25 15:08 UTC] No.46193070{6}[source]▶

>>46192887 #

Jet engines run on oil-based fuels. How may deaths can be attributed to problems related to oil ? We can do this all day :) I would suggest we stop, I was really just being snarky.

63. chmod775 ◴[08 Dec 25 15:24 UTC] No.46193272[source]▶

>>46192939 #

Not quite then as well, since a lot is typically executed in parallel and the implementation details of most number representations make them sensitive to the order of operations.

Given how much number crunching is at the heart of LLMs, these small differences add up.

64. HarHarVeryFunny ◴[08 Dec 25 15:24 UTC] No.46193273{5}[source]▶

>>46192801 #

> So it needs to know facts, albeit the currently accepted ones. Knowing the facts is a good way to compression data.

It's not a compression engine - it's just a statistical predictor.

Would it do better if it was incentivized to compress (i.e training loss rewarded compression as well as penalizing next-word errors)? I doubt it would make a lot of difference - presumably it'd end up throwing away the less frequently occurring "outlier" data in favor of keeping what was more common, but that would result in it throwing away the rare expert opinion in favor of retaining the incorrect vox pop.

replies(1): >>46197849 #

65. thaumasiotes ◴[08 Dec 25 15:28 UTC] No.46193341{4}[source]▶

>>46192585 #

If you want to see how well text generation works on unrehearsed questions, you can ask about what happens in a comic strip.

I found commentary about searching Google for "dark legacy comics who wants some bamboo", and posted results for that search on HN in response to a comment saying "I work at Google on the 'AI Mode' search option, check it out!" ( https://news.ycombinator.com/item?id=45609275 )

And... a few days later, Google stopped automatically including an AI overview when you made that specific search.

But it turns out there are other comics on the internet. So just now I searched for "YAFGC comic 2230" (without the quotes). This is actually a search that makes a lot of sense, because the archive page for the comic is broken and it's impossible to guess the URL of a comic if all you know is the number.

So the goal here is really to find a link to comic 2230. But here's what the AI overview provides:

----- Italics represent original bolding -----

YAFGC comic #2230, titled "You Want It When?", was published on July 17, 2012.

The single-panel comic depicts a conversation at a blacksmith's shop, featuring characters Glon Smitharm/Bloodhand and the Cyclops Bros. The dialogue centers on the unrealistic expectations for custom-made magical items:

- Glon (a half-orc) asks the cyclopes if they can forge a new weapon for him.

- The Cyclops Bros reply with questions about the desired properties of the item (e.g., "enchanted," "plus three") and the required timeframe.

- Glon gives a deadline of "Tuesday," leading the cyclopes to express disbelief at the short notice for such a powerful item.

You can view the comic directly on the official website via this link:

- YAFGC Comic 2230: You Want It When?

----------

(It may look like I've left out a link at the end. That is not the case. The answer ends by saying "you can view the comic directly via this link", in reference to some bold text that includes no link.)

However, I have left out a link from near the beginning. The sentence "The dialogue centers on the unrealistic expectations for custom-made magical items:" is accompanied by a citation to the URL https://www.yafgc.net/comic/2030-insidiously-involved/ , which is a comic that does feature Glon Smitharm/Bloodhand and Ray the Cyclops, but otherwise does not match the description and which is comic 2030 ("Insidiously Involved"), not comic 2230.

The supporting links also include a link to comic 2200 (for no good reason), and that's close enough to 2230 that I was able to navigate there manually. Here it is: https://www.yafgc.net/comic/2230-clover-nabs-her-a-goldie/

You might notice that the AI overview got the link, the date, the title, the appearing characters, the theme, and the dialog wrong.

----- postscript -----

As a bonus comic search, searching for "wow dark legacy 500" got this response from Google's AI Overview:

> Dark Legacy Comic #500 is titled "The Game," a single-panel comic released on June 18, 2015. It features the main characters sitting around a table playing a physical board game, with Keydar remarking that the in-game action has gotten "so realistic lately."

> You can view the comic and its commentary on the official Dark Legacy Comics website. [link]

Compare https://darklegacycomics.com/500 .

That [link] following "the official Dark Legacy Comics website" goes to https://wowwiki-archive.fandom.com/wiki/Dark_Legacy_Comics , by the way.

66. coldtea ◴[08 Dec 25 15:36 UTC] No.46193456[source]▶

>>46153440 (TP) #

>The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility

We ourselves are non-deterministic. We're hardly ever in the same state, can't rollback to prior states, and we hardly ever give the same exact answer when asked the same exact question (and if we include non-verbal communication, never).

67. anal_reactor ◴[08 Dec 25 15:38 UTC] No.46193495{3}[source]▶

>>46192665 #

Just because I'm disagreeable it doesn't mean I'm wrong.

replies(1): >>46193967 #

68. coldtea ◴[08 Dec 25 15:40 UTC] No.46193526[source]▶

>>46191893 #

>but simply because they operate at the level of words, not facts. They are language models.

Facts can be encoded as words. That's something we also do a lot for facts we learn, gather, and convey to other people. 99% of university is learning facts and theories and concept from reading and listening to words.

Also, even when directly observing the same fact, it can be interpreted by different people in different ways, whether this happens as raw "thought" or at the conscious verbal level. And that's before we even add value judgements to it.

>All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C".

And how do we know we don't do something very similar with our facts - make a map of facts and concepts and weights between them for retrieving them and associating them? Even encoding in a similar way what we think of as our "analytic understanding".

replies(1): >>46193781 #

69. coldtea ◴[08 Dec 25 15:45 UTC] No.46193613{4}[source]▶

>>46192585 #

>Except in cases where the training data is more wrong than correct (e.g. niche expertise where the vox pop is wrong)

Same for human knowledge though. Learn from society/school/etc that X is Y, and you repeat X is Y, even if it's not.

>However, an LLM no more deals in Q&A than in facts. It only typically replies to a question with an answer because that itself is statistically most likely, and the words of the answer are just selected one at a time in normal LLM fashion.

And how is that different than how we build up an answer? Do we have a "correct facts" repository with fixed answers to every possibly question, or we just assemble our training data from a weighted graph (or holographic) store of factoids and memories, and our answers are also non deterministic?

replies(1): >>46193907 #

70. ModernMech ◴[08 Dec 25 15:55 UTC] No.46193770[source]▶

>>46191721 #

Yeah, but not when they are expected to perform in a job role. Too much nondeterminism in that case leads to firing and replacing the human with a more deterministic one.

replies(1): >>46194256 #

71. HarHarVeryFunny ◴[08 Dec 25 15:55 UTC] No.46193781{3}[source]▶

>>46193526 #

Animal/human brains and LLMs have fundamentally different goals (or loss functions, if you prefer), even though both are based around prediction.

LLMs are trained to auto-regressively predict text continuations. They are not concerned with the external world and any objective experimentally verifiable facts - they are just self-predicting "this is what I'm going to say next", having learnt that from the training data (i.e. "what would the training data say next").

Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.

What humans/animals are predicting is not some auto-regressive "what will I do next", but rather what will HAPPEN next, based largely on outward-looking sensory inputs, but also internal inputs.

Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).

replies(1): >>46194773 #

72. georgemcbay ◴[08 Dec 25 15:56 UTC] No.46193782{4}[source]▶

>>46193034 #

> Even before LLMs people were asking Google search questions rather than looking for keyword matches

Google gets some of the blame for this by way of how useless Google search became for doing keyword searches over the years. Keyword searches have been terrible for many years, even if you use all the old tricks like quotations and specific operators.

Even if the reason for this is because non-tech people were already trying to use Google in the way that it thinks it optimized for, I'd argue they could have done a better job keeping things working well with keyword searches by training the user with better UI/UX.

(Though at the end of the day, I subscribe to the theory that Google let search get bad for everyone on purpose because once you have monopoly status you show more ads by having a not-great but better-than-nothing search engine than a great one).

73. HarHarVeryFunny ◴[08 Dec 25 16:04 UTC] No.46193907{5}[source]▶

>>46193613 #

We likely learn/generate language in an auto-regressive way at least conceptually similar to an LLM, but this isn't just self-contained auto-regressive generation...

Humans use language to express something (facts, thoughts, etc), so you can consider these thoughts being expressed as a bias to the language generation process, similar perhaps to an image being used as a bias to the captioning part of an image captioning model, or language as a bias to an image generation model.

replies(1): >>46194696 #

74. throw4847285 ◴[08 Dec 25 16:08 UTC] No.46193967{4}[source]▶

>>46193495 #

It means you are not representative of humanity as a whole. You are likely in a small minority of people on an extreme of the personality spectrum. Any attempts to glibly dismiss critiques of AI with a phrase equivalent to "well I hate people" should be glibly dismissed in turn.

replies(1): >>46195201 #

75. biophysboy ◴[08 Dec 25 16:13 UTC] No.46194041[source]▶

>>46192939 #

My understanding is that it selects from a probability distribution. Raising the temperature merely flattens that distribution, Boltzmann factor style

76. dlisboa ◴[08 Dec 25 16:17 UTC] No.46194093[source]▶

>>46191721 #

Which is why every tool that is better than humans at a certain task are deterministic.

77. __MatrixMan__ ◴[08 Dec 25 16:24 UTC] No.46194206[source]▶

>>46153440 (TP) #

Isn't that true of everything else also? Facts about real things are the result of sampling reality several times and coming up with consistent stores about those things. The accuracy of those stories is always bounded by probabilities related to how complete your sampling strategy is.

78. philipallstar ◴[08 Dec 25 16:26 UTC] No.46194222{5}[source]▶

>>46192444 #

But how do they know that, if it's of all users?

replies(1): >>46195062 #

79. pixl97 ◴[08 Dec 25 16:28 UTC] No.46194256{3}[source]▶

>>46193770 #

>but not when they are expected to perform in a job role

I mean, this is why any critical systems involving humans have hard coded checklists and do not depend on people 'just winging it'. We really suck at determinism.

replies(1): >>46194452 #

80. biophysboy ◴[08 Dec 25 16:31 UTC] No.46194287[source]▶

>>46191893 #

I think this is why I get much more utility out of LLMs with writing code. Code can fail if the syntax is wrong; small perturbations in the text (e.g. add a newline instead of a semicolon) can lead to significant increases in the cost function.

Of course, once an LLM is asked to create a bespoke software project for some complex system, this predictability goes away, the trajectory of the tokens succumbs to the intrinsic chaos of code over multi-block length scales, and the result feels more arbitrary and unsatisfying.

I also think this is why the biggest evangelists for LLMs are programmers, while creative writers and journalists are much more dismissive. With human language, the length scale over which tokens can be predicted is much shorter. Even the "laws" of grammar can be twisted or ignored entirely. A writer picks a metaphor because of their individual reading/life experience, not because its the most probable or popular metaphor. This is why LLM writing is so tedious, anodyne, sycophantic, and boring. It sounds like marketing copy because the attention model and RL-HF encourage it.

81. ModernMech ◴[08 Dec 25 16:42 UTC] No.46194452{4}[source]▶

>>46194256 #

I feel like we are talking about different levels of nondeterminism here. The kind of LLM nondeterminism that's problematic has to do with the interplay between its training and its context window.

Take the idea of the checklist. If you give it to a person and tell them to perform with it, if it's their job they will do so. But with the LLM agents, you can give them the checklist, and maybe they apply it at first, but eventually they completely forget it exists. The longer the conversation goes on without reminding them of the checklist, the more likely they're going to act like the checklist never existed at all. And you can't know when this is, so the best solution we have now is to constantly remind them of the exitance of the checklist.

This is the kind of nondeterminism that make LLMs particularly problematic as tools and a very different proposition from a human, because it's less like working with an expert and more like working with a dementia patient.

82. jkubicek ◴[08 Dec 25 16:46 UTC] No.46194503[source]▶

>>46153440 (TP) #

The author's solution feels like adding even more probability to their solution.

> The next time the agent runs, that rule is injected into its context.

Which the agent may or may not choose to ignore.

Any LLM rule must be embedded in an API. Anything else is just asking for bugs or security holes.

83. encyclopedism ◴[08 Dec 25 16:47 UTC] No.46194518[source]▶

>>46153440 (TP) #

I couldn't agree with you more.

I really do find it puzzling so many on HN are convinced LLM's reason or think and continue to entertain this line of reasoning. At the same time also somehow knowing what precisely the brain/mind does and constantly using CS language to provide correspondences where there are none. The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.

As I'm on the subject LLM's don't hallucinate. They output text and when that text is measured and judged by a human to be 'correct' then it is. LLM's 'hallucinate' because that is literally what they can ONLY do, provide some output given some input. They don't actually understand anything about what they output. It's just text.

My paper and pen version of the latest LLM (quite a large bit of paper and certainly a lot of ink I might add) will do the same thing as the latest SOTA LLM. It's just an algorithm.

I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.

replies(4): >>46195534 #>>46197321 #>>46197431 #>>46198125 #

84. coldtea ◴[08 Dec 25 16:59 UTC] No.46194696{6}[source]▶

>>46193907 #

>Humans use language to express something (facts, thoughts, etc), so you can consider these thoughts being expressed as a bias to the language generation process

My point however is more that the "thoughts being expressed" are themselves being generated by a similar process (and that it's either that or a God-given soul).

replies(1): >>46195034 #

85. coldtea ◴[08 Dec 25 17:06 UTC] No.46194773{4}[source]▶

>>46193781 #

>Humans/animals are embodied, living in the real world, whose design has been honed by a "loss function" favoring survival. Animals are "designed" to learn facts about the real world, and react to those facts in a way that helps them survive.

Yes - but LLMs also get this "embodied knowledge" passed down from human-generated training data. We are their sensory inputs in a way (which includes their training images, audio, and video too).

They do learn in a batch manner, and we learn many things not from books but from a more interactive direct being in the world. But after we distill our direct experiences and throughts derived from them as text, we pass them down to the LLMs.

Hey, there's even some kind of "loss function" in the LLM case - from the thumbs up/down feedback we are asked to give to their answers in Chat UIs, to $5/hour "mechanical turks" in Africa or something tasked with scoring their output, to rounds of optimization and pruning during training.

>Animals are predicting something EXTERNAL (facts) vs LLMs predicting something INTERNAL (what will I say next).

I don't think that matters much, in both cases it's information in, information out.

Human animals predict "what they will say/do next" all the time, just like they also predict what they will encounter next ("my house is round that corner", "that car is going to make a turn").

Our prompt to an LLM serves the same role as sensory input from the external world plays to our predictions.

replies(1): >>46195387 #

86. HarHarVeryFunny ◴[08 Dec 25 17:27 UTC] No.46195034{7}[source]▶

>>46194696 #

Similar in the sense of being mechanical (no homunculus or soul!) and predictive, but different in terms of what's being predicted (auto-regressive vs external).

So, with the LLM all you have is the auto-regressive language prediction loop.

With animals you primarily have the external "what happens next" prediction loop, with these external-world fact-based predictions presumably also the basis of their thoughts (planning/reasoning), as well as behavior.

If it's a human animal who has learned language, then you additionally have an LLM-like auto-regressive language prediction loop, but now, unlike the LLM, biased (controlled) by these fact-based thoughts (as well as language-based thoughts).

87. antonvs ◴[08 Dec 25 17:29 UTC] No.46195062{6}[source]▶

>>46194222 #

They didn't claim to know it, they said "it seems to me". Presumably they're extrapolating from their experience, or their expectations of how an average user would behave.

88. anal_reactor ◴[08 Dec 25 17:38 UTC] No.46195201{5}[source]▶

>>46193967 #

Maybe let's try to rectify the discussion. I think that current generation of LLMs displays astounding similarity to human behaviour. I'm not trying to dismiss issues with LLMs, I'm trying to point out the practicality of treating LLMs as awkward humans rather than programs.

Yes, I hate people. But usually whenever there's a critique of LLMs, I can find a parallel issue in people. The extension is that "if people can produce economic value despite their flaws, then so do LLMs, because the flaws are very similar at their core". I feel like HackerNews discussions keep circling around "LLMs bad", which gets very tiresome very fast. I wish there was more enthusiasm. Sure, LLMs have a lot of problems, but they also solve a lot of them too.

It's the dissonance between endless critique of AI on one hand and evergrowing ubiquity on the other. Feels like talking to my dad who refuses to use a GPS and always takes paper maps, and doesn't see the fact that he always arrives late, and keeps citing that one woman who rode into a lake when following GPS.

replies(1): >>46195988 #

89. HarHarVeryFunny ◴[08 Dec 25 17:52 UTC] No.46195387{5}[source]▶

>>46194773 #

> Yes - but LLMs also get this "embodied knowledge" passed down from human-generated training data.

It's not the same though. It's the difference between reading about something and, maybe having read the book and/or watched the video, learning to DO it yourself, acting based on the content of your own mind.

The LLM learns 2nd hand heresay, with no idea of what's true or false, what generalizations are valid, or what would be hallucinatory, etc, etc.

The human learns verifiable facts, uses curiosity to explore and fill the gaps, be creative etc.

I think it's pretty obvious why LLMs have all the limitations and deficiencies that they do.

If 2nd hand heresay (from 1000's of conflicting sources) really was good as 1st hand experience and real-world prediction, then we'd not be having this discussion - we'd be bowing to our AGI overlords (well, at least once the AI also got real-time incremental learning, internal memory, looping, some type of (virtual?) embodiment, autonomy ...).

replies(1): >>46197992 #

90. UniverseHacker ◴[08 Dec 25 17:59 UTC] No.46195484{3}[source]▶

>>46192556 #

You’ll have to wait for the FOOM “Fast Onset of Overwhelming Mastery” for that I’m afraid.

91. bsshjdjddjdj ◴[08 Dec 25 18:02 UTC] No.46195534[source]▶

>>46194518 #

People believe that because they are financially invested in it. Everyone has known LLMs are bullshit for years now.

92. TheOtherHobbes ◴[08 Dec 25 18:33 UTC] No.46195902{4}[source]▶

>>46192283 #

It's worse than that. LLMs are slightly addictive because of intermittent reinforcement.

If they give you nonsense most of the time and an amazing answer occasionally you'll bond with them far more strongly than if they're perfectly correct all time.

Selective reinforcement means you get hooked more quickly if the slot machine pays out once every five times than if it pays out on each spin.

That includes "That didn't work because..." debugging loops.

93. throw4847285 ◴[08 Dec 25 18:40 UTC] No.46195988{6}[source]▶

>>46195201 #

The problem is one of negative polarization. I found myself skeptical of a lot of the claims around LLMs, but was annoyed by AI critics forming an angry mob anytime AI was used for anything. However, I still considered myself in that camp, and ended up far more annoyed by AI boosterism than AI skepticism, which pushed me in the direction of being even more negative about AI than I started. It's the mirror of what happened to you, as far as I can tell. And I'm sure both are very common, though admitting it makes one seem reactive rather than rational and so we don't talk about it.

However, I do dispute your central claim that the issues with LLMs parallel the issues with people. I think that's a very dehumanizing and self-defeating perspective. The only ethical system that is rational is one in which humans have more than instrumental value to each other.

So when critics divide LLMs and humans, sure, there is a descriptive element of trying to be precise about what human thought is, and how it is different than LLMs, etc. But there is also a prescriptive argument that people are embarrassed to make, which is that human beings have to be afforded a certain kind of dignity and there is no reason to extend that to an LLM based on everything we understand about how they function. So if a person screws up your order at a restaurant, or your coworker makes a mistake when coding, you should treat them with charitability and empathy.

I'm sure this sounds silly to you, but it shouldn't. The bedrock of the Enlightenment project was that scientific inquiry would lead to human flourishing. That's humanism. If we've somehow strayed so far from that, such that appeals to human dignity don't make sense anymore, I don't know what to say.

replies(1): >>46199104 #

94. exe34 ◴[08 Dec 25 19:07 UTC] No.46196295{5}[source]▶

>>46191991 #

Is birdflu the failure mode?

95. Forgeties79 ◴[08 Dec 25 19:44 UTC] No.46196694{4}[source]▶

>>46193034 #

Yeah I pretty much agree with everything you’ve got here

96. thethirdone ◴[08 Dec 25 20:38 UTC] No.46197321[source]▶

>>46194518 #

> The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.

The ratio of cognition to knowledge is much higher in humans that LLMs. That is for sure. It is improving in LLMs, particularly small distillations of large models.

A lot of where the discussion gets hung up on is just words. I just used "knowledge" to mean ability to recall and recite a wide range of fasts. And "cognition" to mean the ability to generalize, notice novel patterns and execute algorithms.

> They don't actually understand anything about what they output. It's just text.

In the case of number multiplication, a bunch of papers have shown that the correct algorithm for the first and last digits of the number are embedded into the model weights. I think that counts as "understanding"; most humans I have talked to do not have that understanding of numbers.

> It's just an algorithm.

> I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.

I don't think something being an algorithm means it can't reason, know or understand. I can come up with perfectly rigorous definitions of those words that wouldn't be objectionable to almost anyone from 2010, but would be passed by current LLMs.

I have found anthropomorphizing LLMs to be a reasonably practical way to leverage the human skill of empathy to predict LLM performance. Treating them solely as text predictors doesn't offer any similar prediction; it is simply too complex to fit into a human mind. Paying a lot of attention to benchmarks, papers, and personal experimentation can give you enough data to make predictions from data, but it is limited to current models, is a lot of work, and isn't much more accurate than anthropomorphization.

replies(1): >>46198858 #

97. plasticeagle ◴[08 Dec 25 20:45 UTC] No.46197431[source]▶

>>46194518 #

I have had conversations at work, with people who I have reason to believe are smart and critical, in which they made the claim that humans and AI basically learn in the same way. My response to them, as to anyone that makes this claim, is that the amount of data ingested by someone with severe sensory dysfunction of one sort or another is very small. Helen Keller is the obvious extreme example, but even a person who is simply blind is limited to the bandwidth of their hearing.

And yet, nobody would argue that a blind person is any less intelligent that a sighted person. And so the amount of data a human ingests is not correlated with intelligence. Intelligence is something else.

When LLMs were first proposed as useful tools for examining data and proving answers to questions, I wondered to myself how they would solve the problem of there being no a-priori knowledge of truth in the models. How they would find a way of sifting their terabytes of training data so that the models learnt only true things.

Imagine my surprise that not only did they not attempt to do this, but most people did not appear to understand that this was a fundamental and unsolvable problem at the heart of every LLM that exists anywhere. That LLMs, without this knowledge, are just random answer generators. Many, many years ago I wrote a fun little Markov-chain generator I called "Talkback", that you could feed a short story to and then have a chat with. It enjoyed brief popularity at the University I attended, you could ask it questions and it would sort-of answer. Nobody, least of all myself, imagined that the essential unachievable idea - "feed in enough text and it'll become human" - would actually be a real idea in real people's heads.

This part of your answer though;

"My paper and pen version of the latest LLM .... My paper and pen version of the latest LLM"

Is just a variation of the Chinese Room argument, and I don't think it holds water by itself. It's not that it's just an algorithm, it's that learning anything usefully correct from the entire corpus of human literary output by itself is fundamentally impossible.

replies(1): >>46199021 #

98. mrguyorama ◴[08 Dec 25 20:54 UTC] No.46197538{3}[source]▶

>>46192554 #

The reason search got so bad, even pretending google themselves are some beneficial actors, is because it is a directly adversarial process. It is profitable to be higher in search results than you "naturally" would be, so of course people attack it.

Google's entire theory of founding was that you could do better than Yahoo hand picking websites with an algorithm, and pagerank was the demonstration, but IMO that was only possible with a dataset that was non-adversarial because you couldn't "attack" yahoo and friend's processes from the data itself.

The moment that changed, the moment pagerank was used in production, the game was up. As long as you try to use content to judge search ranking, content will be changed, modified, abused, cheated to increase your search rank.

The very moment it becomes profitable to do the same for LLM "search", it will happen. LLMs are rather vulnerable to "attack", and will run into the exact same adversarial environment that nullified the effectiveness of pagerank.

This is orthogonal also to if you believe Google let search be shittier to improve their ad empire. LLM "search" will have exactly this same problem if you believe it exists.

If you build a credit card fraud model on a dataset that contains no attacks, you will build a rather bad fraud model. The same is true of pagerank and algorithmic search.

replies(1): >>46198173 #

99. wisty ◴[08 Dec 25 21:24 UTC] No.46197849{6}[source]▶

>>46193273 #

Both compression engines and llm work by assigning scores to the next token. If you can guess the probability distribution of the next token you have a near perfect text compressor, and a near perfect llm. Yeah in the real world they have different trade-offs.

Here's a paper by deep mind. https://arxiv.org/pd7f/2309.10668 - titled LANGUAGE MODELING IS COMPRESSION

replies(1): >>46206075 #

100. zby ◴[08 Dec 25 21:38 UTC] No.46197992{6}[source]▶

>>46195387 #

"The LLM learns 2nd hand heresay, with no idea of what's true or false, what generalizations are valid, or what would be hallucinatory, " - do you know what is true and what is false? Take this: https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Ch... - Do you believe your eyes or do you believe the text about it?

replies(1): >>46199806 #

101. zby ◴[08 Dec 25 21:49 UTC] No.46198125[source]▶

>>46194518 #

Most of things that were considered reasoning are now trivially implemented by computers - from arithmetic, through logical inference (surely this is reasoning - isn't it) to playing chess. Now LLMs go even further - what is your definition of reasoning? What concrete action is in that definition that you are sure computer will not do in lets say 5 years?

replies(1): >>46198934 #

102. bee_rider ◴[08 Dec 25 21:53 UTC] No.46198173{4}[source]▶

>>46197538 #

Oh, that’s an interesting thought, I was really hoping LLMs would break the cycle there but of course there’s no reason to assume they’d be immune to adversarial content optimization.

103. wisty ◴[08 Dec 25 21:54 UTC] No.46198179{6}[source]▶

>>46192723 #

It was a polite way of saying "that's kinda bull".

And yes, I know what an expert system is.

Do you know that a neural network (or set of matrices, same thing really) can approximate anything else? https://en.wikipedia.org/wiki/Universal_approximation_theore...

How do you know that inside the black box, they don't approximate expert systems?

replies(1): >>46202402 #

104. wisty ◴[08 Dec 25 22:03 UTC] No.46198267{6}[source]▶

>>46192769 #

It needs something mathematically equivalent (or approximately the same), under the hood, to guess the next word effectively.

We are just meat eating bags of meat, but to do our job better we needed to evolve intelligence. A word guessing bag of words also needs to evolve intelligence and a world model (albeit an impicit hidden one) to do its job well, and is optimised towards this.

And yes, it also gets fine trained. And either its world model is corrupted by our mistakes (both in trining and fine tuning), or even more disturbingly it simplicity might (in theory) figue out one day (in training, impicitly - and yes it doesn't really think the way we do) something like "huh, the universe is actually easier to predict if it is modelled as alphabet spaghetti, not quantum waves, but my training function says not to mention this".

105. encyclopedism ◴[08 Dec 25 22:59 UTC] No.46198858{3}[source]▶

>>46197321 #

> The ratio of cognition to knowledge is much higher in humans that LLMs. That is for sure. It is improving in LLMs, particularly small distillations of large models.

It isn't a case of ratio it is a fundamentally different method of working hence my point of not needing all human literary output do the the equivalent of an LLM. Consider even the case of a person born blind they have an even more severe deficiency of input yet they are equivalent in cognitive capacity to a sighted person and certainly any LLM.

> In the case of number multiplication, a bunch of papers have shown that the correct algorithm for the first and last digits of the number are embedded into the model weights. I think that counts as "understanding";

Why are those numbers in the model weights? What if the model was trained on birdsong instead of humanities output would it then be able to multiply? Humans provide the connections, the reasoning the thought the insights and the subsequent correlations THEN we humans try to make a good pattern matcher/ guesser (the LLM) to match those. We tweak it so it matches patterns more and more closely.

> most humans I have talked to do not have that understanding of numbers.

This common retort: most humans also makes mistakes, or most humans also do x, y, z means nothing. Take the opposite implication of such retorts. For example most humans can't multiply 10 digits numbers therefore most calculators 'understand' maths better than most humans.

> I don't think something being an algorithm means it can't reason, know or understand. I can come up with perfectly rigorous definitions of those words that wouldn't be objectionable to almost anyone from 2010, but would be passed by current LLMs.

My digital thermometer uses an algorithm to determine the temperature. It does NOT reason when doing so. An algorithm is a series of steps. You can write them on a piece of paper. The paper will not be thinking if that is done.

> I have found anthropomorphizing LLMs to be a reasonably practical way to....

I think anthropomorphising is letting people assume they are more than they are (next token generators). In fact at the extreme end this anthropomorphising has led to exacerbating mental health conditions and unfortunately has even led to humans killing themselves.

replies(1): >>46199320 #

106. encyclopedism ◴[08 Dec 25 23:07 UTC] No.46198934{3}[source]▶

>>46198125 #

The definition of things such as reasoning, understanding, intellect are STILL open academic questions. Quite literally humans greatest minds are currently attempting to tease out definitions, whatever we currently have falls short. For example see the hard problem of consciousness.

However I can attempt to provide insight by taking the opposite approach here. For instance what is NOT reasoning. Getting a computer to follow a series of steps (an algorithm) is NOT reasoning. A chess computer is NOT reasoning it is following a series of steps. The implications of assuming that the chess computer IS reasoning would have profound affects on so much, for example it would imply your digital thermostat also reasons!

107. encyclopedism ◴[08 Dec 25 23:16 UTC] No.46199021{3}[source]▶

>>46197431 #

I concur with your sentiments.

> My paper and pen version of the latest LLM

My point here was to attempt to remove the mystery of LLM's by showing the same thing can be done with pen and paper version, after all an LLM is an algorithm. Because an LLM is running on a 'supercomputer' or is digital doesn't provide it some mysterious new powers.

108. encyclopedism ◴[08 Dec 25 23:20 UTC] No.46199061[source]▶

>>46191973 #

Hallucinations can never be fixed. LLM's 'hallucinate' because that is literally what they can ONLY do, provide some output given some input. The output is measured and judged by a human who then classifies it as 'correct' or 'incorrect'. In the later case it seems to be labelled as a 'hallucination' as if it did something wrong. It did nothing wrong and worked exactly as it was programmed to do.

109. anal_reactor ◴[08 Dec 25 23:25 UTC] No.46199104{7}[source]▶

>>46195988 #

It sounds silly to me not because I don't value humans. I don't value humans because of my personal grievances that are difficult to defend in a serious ethical discussion. It sounds silly to me because it leaves "human" undefined. To me, the question "is LLM human?" is eerily similar to "are black people people?" and "are Jews people?". AI displays intelligence but it doesn't deserve respect because it doesn't meet certain biological requirements. Really awkward position to defend.

Instead of "humanism", where "human" is at the centre, I'd like to propose a view where loosely defined intelligence is at the centre. In pre-AI world that view was consistent with humanism because humans were the only entity that displayed advanced intelligence, with the added bonus that it explains why people tend to value complex life forms more than simple ones. When AI enters the picture, it places sufficiently advanced AI above humans. Which is fine, because AI is nothing but the next step of evolution. It's like placing "homo sapiens" above "homo erectus" except AI is "homo sapiens" and we are "homo erectus". Makes a lot of sense IMO.

replies(3): >>46199982 #>>46200103 #>>46200109 #

110. thethirdone ◴[08 Dec 25 23:46 UTC] No.46199320{4}[source]▶

>>46198858 #

You did not actually address the core of my points at all.

> It isn't a case of ratio it is a fundamentally different method of working hence my point of not needing all human literary output do the the equivalent of an LLM.

You can make ratios of anything. I agree that human cognition is different than LLM cognition, though I would think of it more like a phase difference than fundamentally different phenomena. Think liquid water vs steam, the density (a ratio) is vastly different and they have different harder to describe properties (surface tension, filling volume, incompressible vs compressible).

> Humans provide the connections, the reasoning the thought the insights and the subsequent correlations THEN we humans try to make a good pattern matcher/ guesser (the LLM) to match those.

Yes, humans provide the training data and benchmarks for measuring LLM improvement. Somehow meaning about the world has to get trained on to have any understanding. However, humans talking about patterns in number is not how the LLMs learned this. It is very much from just seeing lots of examples and deducing (during training not inference) the pattern. The fact that a general pattern is embedded in the weights implies that some general understand of many things are baked into the model.

> This common retort: most humans also makes mistakes, or most humans also do x, y, z means nothing.

It is not a retort, but some argument towards what "understanding" means. From what you have said, my guess of your definition makes "understanding" what humans do and computers are incapable of (by definition). If LLMs could out compete humans in all professional tasks, I think it would be hard to say they understand nothing. Humans are a worthwhile point of comparison and human exceptionalism can only really hold up until being surpassed.

I would also point out that some humans DO understand the properties of numbers I was referring to. In fact, I figured it out in second grade while doing lots of extra multiplication problems as punishment for being a brat.

> My digital thermometer uses an algorithm to determine the temperature. ... The paper will not be thinking if that is done.

I did not say "All algorithms are thinking". The stronger version of what I was saying is "Some algorithms can think." You simply have asserted the opposite with no reasoning.

> In fact at the extreme end this anthropomorphising has led to exacerbating mental health conditions and unfortunately has even led to humans killing themselves.

I do concede that anthropomorphizing can be problematic, especially if you do not have a background in CS and ML to understand beneath the hood. However, you completely skipped past my rather specific explanation of how it can be useful. On HN in particular, I do expect people to bring enough technical understanding to the table to not just treat LLMs as people.

111. HarHarVeryFunny ◴[09 Dec 25 00:36 UTC] No.46199806{7}[source]▶

>>46197992 #

I can experiment and verify, can't I ?

replies(1): >>46203144 #

112. ◴[09 Dec 25 00:56 UTC] No.46199982{8}[source]▶

>>46199104 #

113. throw4847285 ◴[09 Dec 25 01:13 UTC] No.46200103{8}[source]▶

>>46199104 #

Now I understand your love of LLMs. What you write reads like the output of an LLM but with the dial turned from obsequious to edgelord. There is no content, just posturing. None of what you wrote holds up to any scrutiny, and much of it is internally contradictory, but it doesn't really matter to you, I guess. I don't think you're even talking to me.

replies(1): >>46202277 #

114. ◴[09 Dec 25 01:14 UTC] No.46200109{8}[source]▶

>>46199104 #

115. anal_reactor ◴[09 Dec 25 07:37 UTC] No.46202277{9}[source]▶

>>46200103 #

I take it as a compliment. I've always been like this. I challenged core assumptions, people didn't like it, later it would turn out I was right.

116. tovej ◴[09 Dec 25 07:56 UTC] No.46202402{7}[source]▶

>>46198179 #

I'm not sure you do, because expert systems are constraint solvers and LLMs are not. They literally deal in encoded facts, which is what the original comment was about.

The universal approximation theorem is not relevant. You would first have to try to train the neural network to approximate a constraint solver (that's not the case with LLMs), and in practice, these kinds of systems are exactly the ones that a neural network is bad at.

The universal approximation theory says nothing about feasibility, it only talks about theoretical existence as a mathematical object, not whether the object can actually be created in the real world.

I'll remind you that the expert system would have to have been created and updated by humans. It would have had to have been created before a neural network was applied to it in the first place.

117. coldtea ◴[09 Dec 25 09:49 UTC] No.46203144{8}[source]▶

>>46199806 #

Do you? Do most? Do we for 99.999% of stuff we're taught?

Besides, the LLM can also "experiment and verify" some things now. E.g. it can spin up Python and run a script to verify some answers.

replies(1): >>46205831 #

118. HarHarVeryFunny ◴[09 Dec 25 15:18 UTC] No.46205831{9}[source]▶

>>46203144 #

I think if we're considering the nature of intelligence, pursuant to trying to replicate it, then the focus needs to be more evolutionary and functional, not the behavior of lazy modern humans who can get most of their survival needs met at Walmart or Amazon!

The way that animals (maybe think apes and dogs, etc, not just humans) learn is by observing and interacting. If something is new or behaves in unexpected ways then "prediction failure", aka surprise, leads to them focusing on it and interacting with it, which is the way evolution has discovered for them to learn more about it.

Yes, an LLM has some agency via tool use, and via tool output it can learn/verify to some extent, although without continual learning this is only of ephemeral value.

This is all a bit off topic to my original point though, which is the distinction between trying to learn from 2nd hand conflicting heresay (he said, she said) vs having the ability to learn the truth for yourself, which starts with being built to predict the truth (external real-world) rather than being built to predict statistical "he said, she said" continuations. Sure, you can mitigate a few of an LLM's shortcomings by giving them tools etc, but fundamentally they are just doing the wrong thing (self-prediction) if you are hoping for them to become AGI rather than just being language models.

119. HarHarVeryFunny ◴[09 Dec 25 15:37 UTC] No.46206075{7}[source]▶

>>46197849 #

An LLM is a transformer of a specific size (number of layers, context width, etc), and ultimately number of parameters. A trillion parameter LLM is going to use all trillion parameters regardless of whether you train it on 100 samples or billions of them.

Neural nets, including transformers, learn by gradient descent, according to the error feedback (loss function) they are given. There is no magic happening. The only thing the neural net is optimizing for is minimizing errors on the loss function you give it. If the loss function is next-token error (as it is), then that is ALL it is optimizing for - you can philosophize about what they are doing under the hood, and write papers about that ("we advocate for viewing the prediction problem through the lens of compression"), but at the end of the day it is only pursuant to minimizing the loss. If you want to encourage compression, then you would need to give an incentive for that (change the loss function).

↑