←back to thread

296 points todsacerdoti | 1 comments | | HN request time: 0.001s | source
Show context
smeeth ◴[] No.44368465[source]
The main limitation of tokenization is actually logical operations, including arithmetic. IIRC most of the poor performance of LLMs for math problems can be attributed to some very strange things that happen when you do math with tokens.

I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.

replies(6): >>44368862 #>>44369438 #>>44371781 #>>44373480 #>>44374125 #>>44375446 #
calibas ◴[] No.44368862[source]
It's a non-deterministic language model, shouldn't we expect mediocre performance in math? It seems like the wrong tool for the job...
replies(4): >>44368958 #>>44368999 #>>44369121 #>>44372463 #
rictic ◴[] No.44369121[source]
Models are deterministic, they're a mathematical function from sequences of tokens to probability distributions over the next token.

Then a system samples from that distribution, typically with randomness, and there are some optimizations in running them that introduce randomness, but it's important to understand that the models themselves are not random.

replies(2): >>44369860 #>>44370679 #
mgraczyk ◴[] No.44369860[source]
This is only ideally true. From the perspective of the user of a large closed LLM, this isn't quite right because of non-associativity, experiments, unversioned changes, etc.

It's best to assume that the relationship between input and output of an LLM is not deterministic, similar to something like using a Google search API.

replies(1): >>44370022 #
ijk ◴[] No.44370022[source]
And even on open LLMs, GPU instability can cause non-determinism. For performance reasons, determinism is seldom guaranteed in LLMs in general.
replies(1): >>44376582 #
1. rar00 ◴[] No.44376582[source]
yep, even with greedy sampling and fixed system state, numerical instability is sufficient to make output sequences diverge when processing the same exact input