Most active commenters

    ←back to thread

    296 points todsacerdoti | 13 comments | | HN request time: 0.202s | source | bottom
    Show context
    smeeth ◴[] No.44368465[source]
    The main limitation of tokenization is actually logical operations, including arithmetic. IIRC most of the poor performance of LLMs for math problems can be attributed to some very strange things that happen when you do math with tokens.

    I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.

    replies(6): >>44368862 #>>44369438 #>>44371781 #>>44373480 #>>44374125 #>>44375446 #
    1. calibas ◴[] No.44368862[source]
    It's a non-deterministic language model, shouldn't we expect mediocre performance in math? It seems like the wrong tool for the job...
    replies(4): >>44368958 #>>44368999 #>>44369121 #>>44372463 #
    2. drdeca ◴[] No.44368958[source]
    Deterministic is a special case of not-necessarily-deterministic.
    3. CamperBob2 ◴[] No.44368999[source]
    We passed 'mediocre' a long time ago, but yes, it would be surprising if the same vocabulary representation is optimal for both verbal language and mathematical reasoning and computing.

    To the extent we've already found that to be the case, it's perhaps the weirdest part of this whole "paradigm shift."

    4. rictic ◴[] No.44369121[source]
    Models are deterministic, they're a mathematical function from sequences of tokens to probability distributions over the next token.

    Then a system samples from that distribution, typically with randomness, and there are some optimizations in running them that introduce randomness, but it's important to understand that the models themselves are not random.

    replies(2): >>44369860 #>>44370679 #
    5. mgraczyk ◴[] No.44369860[source]
    This is only ideally true. From the perspective of the user of a large closed LLM, this isn't quite right because of non-associativity, experiments, unversioned changes, etc.

    It's best to assume that the relationship between input and output of an LLM is not deterministic, similar to something like using a Google search API.

    replies(1): >>44370022 #
    6. ijk ◴[] No.44370022{3}[source]
    And even on open LLMs, GPU instability can cause non-determinism. For performance reasons, determinism is seldom guaranteed in LLMs in general.
    replies(1): >>44376582 #
    7. geysersam ◴[] No.44370679[source]
    The LLMs are deterministic but they only return a probability distribution over following tokens. The tokens the user sees in the response are selected by some typically stochastic sampling procedure.
    replies(1): >>44371710 #
    8. danielmarkbruce ◴[] No.44371710{3}[source]
    Assuming decent data, it won't be stochastic sampling for many math operations/input combinations. When people suggest LLMs with tokenization could learn math, they aren't suggesting a small undertrained model trained on crappy data.
    replies(1): >>44372243 #
    9. anonymoushn ◴[] No.44372243{4}[source]
    I mean, this depends on your sampler. With temp=1 and sampling from the raw output distribution, setting aside numerics issues, these models output nonzero probability of every token at each position
    replies(1): >>44380103 #
    10. currymj ◴[] No.44372463[source]
    thanks to training data + this being a popular benchmark, they're pretty good at grinding through symbolic mathematical derivations, which is often useful if you want an explanation of a mathematical concept. there's not really a better tool for this job, except for "a textbook which answers the exact question you have".

    but from time to time, doing this does require doing arithmetic correctly (to correctly add two exponents or whatever). so it would be nice to be able to trust that.

    i imagine there are other uses for basic arithmetic too, QA applications over data that quotes statistics and such.

    replies(1): >>44372556 #
    11. agarren ◴[] No.44372556[source]
    > but from time to time, doing this does require doing arithmetic correctly (to correctly add two exponents or whatever). so it would be nice to be able to trust that.

    It sounds weird, but try writing your problem in LaTeX - I don’t know why, I’ve found a couple models to be incredibly capable at solving mathematical problems if you write them in LaTeX.

    12. rar00 ◴[] No.44376582{4}[source]
    yep, even with greedy sampling and fixed system state, numerical instability is sufficient to make output sequences diverge when processing the same exact input
    13. danielmarkbruce ◴[] No.44380103{5}[source]
    A large model well trained on good data will have logits so negative for something like "1+1=" -> 3 that they won't come up in practice unless you sample in a way to deliberately misuse the model.