Most active commenters

aithrowawaycomm(4)
ipsum2(3)

Popular/hot comments

>>42142807 #

←back to thread

Something weird is happening with LLMs and chess

(dynomight.substack.com)

Show context

azeirah ◴[14 Nov 24 22:43 UTC] No.42141993[source]▶

>>42138289 (OP) #

Maybe I'm really stupid... but perhaps if we want really intelligent models we need to stop tokenizing at all? We're literally limiting what a model can see and how it percieves the world by limiting the structure of the information streams that come into the model from the very beginning.

I know working with raw bits or bytes is slower, but it should be relatively cheap and easy to at least falsify this hypothesis that many huge issues might be due to tokenization problems but... yeah.

Surprised I don't see more research into radicaly different tokenization.

replies(14): >>42142033 #>>42142384 #>>42143197 #>>42143338 #>>42143381 #>>42144059 #>>42144207 #>>42144582 #>>42144600 #>>42145725 #>>42146419 #>>42146444 #>>42149355 #>>42151016 #

1. aithrowawaycomm ◴[14 Nov 24 23:25 UTC] No.42142384[source]▶

>>42141993 #

FWIW I think most of the "tokenization problems" are in fact reasoning problems being falsely blamed on a minor technical thing when the issue is much more profound.

E.g. I still see people claiming that LLMs are bad at basic counting because of tokenization, but the same LLM counts perfectly well if you use chain-of-thought prompting. So it can't be explained by tokenization! The problem is reasoning: the LLM needs a human to tell it that a counting problem can be accurately solved if they go step-by-step. Without this assistance the LLM is likely to simply guess.

replies(6): >>42142733 #>>42142807 #>>42143239 #>>42143800 #>>42144596 #>>42146428 #

2. ipsum2 ◴[15 Nov 24 00:17 UTC] No.42142733[source]▶

>>42142384 (TP) #

The more obvious alternative is that CoT is making up for the deficiencies in tokenization, which I believe is the case.

replies(1): >>42142913 #

3. Der_Einzige ◴[15 Nov 24 00:29 UTC] No.42142807[source]▶

>>42142384 (TP) #

I’m the one who will fight you including with peer reviewed papers indicating that it is in fact due to tokenization. I’m too tired but will edit this for later, so take this as my bookmark to remind me to respond.

replies(4): >>42142884 #>>42144506 #>>42145678 #>>42147347 #

4. aithrowawaycomm ◴[15 Nov 24 00:40 UTC] No.42142884[source]▶

>>42142807 #

I am aware of errors in computations that can be fixed by better tokenization (e.g. long addition works better tokenizing right-left rather than L-R). But I am talking about counting, and talking about counting words, not characters. I don’t think tokenization explains why LLMs tend to fail at this without CoT prompting. I really think the answer is computational complexity: counting is simply too hard for transformers unless you use CoT. https://arxiv.org/abs/2310.07923

replies(1): >>42143144 #

5. aithrowawaycomm ◴[15 Nov 24 00:44 UTC] No.42142913[source]▶

>>42142733 #

I think the more obvious explanation has to do with computational complexity: counting is an O(n) problem, but transformer LLMs can’t solve O(n) problems unless you use CoT prompting: https://arxiv.org/abs/2310.07923

replies(2): >>42143402 #>>42150368 #

6. cma ◴[15 Nov 24 01:34 UTC] No.42143144{3}[source]▶

>>42142884 #

Words vs characters is a similar problem, since tokens can be less one word, multiple words, or multiple words and a partial word, or words with non-word punctuation like a sentence ending period.

7. TZubiri ◴[15 Nov 24 01:50 UTC] No.42143239[source]▶

>>42142384 (TP) #

FWIW I think most of the "tokenization problems"

List of actual tokenizarion limitations 1- strawberry 2- rhyming and metrics 3- whitespace (as displayed in the article)

8. ipsum2 ◴[15 Nov 24 02:20 UTC] No.42143402{3}[source]▶

>>42142913 #

What you're saying is an explanation what I said, but I agree with you ;)

replies(1): >>42148535 #

9. meroes ◴[15 Nov 24 03:46 UTC] No.42143800[source]▶

>>42142384 (TP) #

At a certain level they are identical problems. My strongest piece of evidence is that I get paid as an RLHF'er to find ANY case of error, including "tokenization". You know how many errors an LLM gets in the simplest grid puzzles, with CoT, with specialized models that don't try to "one-shot" problems, with multiple models, etc?

My assumption is that these large companies wouldn't pay hundreds of thousands of RLHF'ers through dozens of third party companies livable wages if tokenization errors were just that.

replies(1): >>42149054 #

10. Jensson ◴[15 Nov 24 06:55 UTC] No.42144506[source]▶

>>42142807 #

We know there are narrow solutions to these problems, that was never the argument that the specific narrow task is impossible to solve.

The discussion is about general intelligence, the model isn't able to do a task that it can do simply because it chooses the wrong strategy, that is a problem of lack of generalization and not a problem of tokenization. Being able to choose the right strategy is core to general intelligence, altering input data to make it easier for the model to find the right solution to specific questions does not help it become more general, you just shift what narrow problems it is good at.

11. csomar ◴[15 Nov 24 07:17 UTC] No.42144596[source]▶

>>42142384 (TP) #

It can count words in a paragraph though. So I do think it's tokenization.

12. azeirah ◴[15 Nov 24 10:46 UTC] No.42145678[source]▶

>>42142807 #

I strongly believe that the problem isn't that tokenization isn't the underlying problem, it's that, let's say bit-by-bit tokenization is too expensive to run at the scales things are currently being ran at (openai, claude etc)

replies(1): >>42150150 #

13. PittleyDunkin ◴[15 Nov 24 12:48 UTC] No.42146428[source]▶

>>42142384 (TP) #

I feel like we can set our qualifying standards higher than counting.

14. pmarreck ◴[15 Nov 24 14:43 UTC] No.42147347[source]▶

>>42142807 #

My intuition says that tokenization is a factor especially if it splits up individual move descriptions differently from other LLM's

If you think about how our brains handle this data input, it absolutely does not split them up between the letter and the number, although the presence of both the letter and number together would trigger the same 2 tokens I would think

15. aithrowawaycomm ◴[15 Nov 24 16:45 UTC] No.42148535{4}[source]▶

>>42143402 #

No, it's a rebuttal of what you said: CoT is not making up for a deficiency in tokenization, it's making up for a deficiency in transformers themselves. These complexity results have nothing to do with tokenization, or even LLMs, it is about the complexity class of problems that can be solved by transformers.

replies(1): >>42150513 #

16. 1propionyl ◴[15 Nov 24 17:39 UTC] No.42149054[source]▶

>>42143800 #

> hundreds of thousands of RLHF'ers through dozens of third party companies

Out of curiosity, what are these companies? And where do they operate.

I'm always interested in these sorts of "hidden" industries. See also: outsourced Facebook content moderation in Kenya.

replies(1): >>42159108 #

17. int_19h ◴[15 Nov 24 19:39 UTC] No.42150150{3}[source]▶

>>42145678 #

It's not just a current thing, either. Tokenization basically lets you have a model with a larger input context than you'd otherwise have for the given resource constraints. So any gains from feeding the characters in directly have to be greater than this advantage. And for CoT especially - which we know produces significant improvements in most tasks - you want large context.

18. MacsHeadroom ◴[15 Nov 24 20:05 UTC] No.42150368{3}[source]▶

>>42142913 #

This paper does not support your position any more than it supports the position that the problem is tokenization.

This paper posits that if the authors intuition was true then they would find certain empirical results. ie. "If A then B." Then they test and find the empirical results. But this does not imply that their intuition was correct, just as "If A then B" does not imply "If B then A."

If the empirical results were due to tokenization absolutely nothing about this paper would change.

19. ipsum2 ◴[15 Nov 24 20:21 UTC] No.42150513{5}[source]▶

>>42148535 #

There's a really obvious way to test whether the strawberry issue is tokenization - replace each letter with a number, then ask chatGPT to count the number of 3s.

Count the number of 3s, only output a single number: 6 5 3 2 8 7 1 3 3 9.

ChatGPT: 3.

20. meroes ◴[16 Nov 24 20:38 UTC] No.42159108{3}[source]▶

>>42149054 #

Scale AI is a big one who owns companies who do this as well, such as Outlierai.

There are many other AI trainer job companies though. A lot of it is gig work but the pay is more than the vast majority of gig jobs.

↑