←back to thread

108 points bertman | 4 comments | | HN request time: 0.208s | source
Show context
xpe ◴[] No.43822892[source]
> Theories are developed by doing the work and LLMs do not do the work. They ingest the output of work.

This is often the case but does not _have_ to be so. LLMs can use chain of thought to “talk out loud” and “do the work”. It can use supplementary documents and iterate on its work. The quality of course varies, but it is getting better. When I read Gemini 2.5’s “thinking” notes, it indeed can build up text that is not directly present in its training data.

Putting aside anthropocentric definitions of “reasoning” and “consciousness” are key to how I think about the issues here. I’m intentionally steering completely clear of consciousness.

Modern SOTA LLMs are indeed getting better at what people call “reasoning”. We don’t need to quibble over defining some quality bar; that is probably context-dependent and maybe even arbitrary.

It is clear LLMs are doing better at “reasoning” — I’m using quotes to emphasize that (to me) it doesn’t matter if their inner mechanisms for doing reasoning don’t look like human mechanisms. Instead, run experiments and look at the results.

We’re not talking about the hard problem of consciousness, we’re talking about something that can indeed be measured: roughly speaking, the ability to derive new truths from existing ones.

(Because this topic is charged and easily misunderstood, let me clarify some questions that I’m not commenting on here: How far can the transformer-based model take us? Are data and power hungry AI models cost-effective? What viable business plans exist? How much short-term risk, to say, employment and cybersecurity? How much long-term risk to human values, security, thriving, and self-determination?)

Even if you disagree with parts of my characterization above, hear this: We should at least be honest to ourselves when we move the goal posts.

Don’t mistake my tone for zealotry. I’m open to careful criticism. If you do, please don’t try to lump me into one “side” on the topic of AI — whether it be market conditions, commercialization, safety, or research priorities — you probably don’t know me well enough to do that (yet). Apologies for the pre-defensive posture; but the convos here are often … fraught, so I’m trying to head off some of the usual styles of reply.

replies(1): >>43825876 #
1. geraneum ◴[] No.43825876[source]
> it indeed can build up text that is not directly present in its training data.

I’m curious how you know that.

replies(2): >>43826236 #>>43826405 #
2. triclops200 ◴[] No.43826236[source]
Not the parent, but, it's kinda obvious in even degenerate logic cases:

# user

what's 78904 + 34567282902

# Gemma 2.5 pro

    Identify the core task: The user wants to add two numbers: 78904 and 34567282902.
    Recognize the operation: The operation is addition (+).
    Perform the calculation:
        Align the numbers vertically by place value:

          34567282902
        +       78904
        -------------

        Add column by column, starting from the rightmost column (ones place):
            2 + 4 = 6
            0 + 0 = 0
            9 + 9 = 18 (write down 8, carry over 1)
            2 + 8 + 1 (carry-over) = 11 (write down 1, carry over 1)
            8 + 7 + 1 (carry-over) = 16 (write down 6, carry over 1)
            2 + 1 (carry-over) = 3
            7 + 0 = 7
            6 + 0 = 6
            5 + 0 = 5
            4 + 0 = 4
            3 + 0 = 3
        Combine the results: 34567361806
    Format the answer: Present the calculation clearly and state the final result.
To find the sum of 78904 and 34567282902, you add the two numbers together:

   34567282902
+ 78904 -------------- 34567361806

So, 78904+34567282902=34567361806.

The model definitely iteratively built up (useful and correct even) text that wasn't directly in the training data (I doubt that number ever appeared anywhere in there given that 34567361806 doesn't appear in any searches I do of various engines)

replies(1): >>43826908 #
3. xpe ◴[] No.43826405[source]
To answer directly: Ask a question. Watch the “thinking” process. Estimate the likelihood that all of the generated text is in the training data.

Do you disagree with my claim?

Or perhaps you were hoping for a very rigorous set of experiments?

4. geraneum ◴[] No.43826908[source]
> The model definitely iteratively built up (useful and correct even) text that wasn't directly in the training data

The text is highly likely in training data, as it’s textbook arithmetic instructions. It’s the number that is probably not there. Simple arithmetic is one of the verifiable operation types (truths) with a straightforward reward function used to train CoT models. In your example, what’s interesting to me is improving LLM inference with RL that can result in such wonderful outcomes, but that’s perhaps a different question.