(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.199s | source

Show context

pseudosavant ◴[14 Nov 24 22:11 UTC] No.42141758[source]▶

LLMs aren't really language models so much as they are token models. That is how they can also handle input in audio or visual forms because there is an audio or visual tokenizer. If you can make it a token, the model will try to predict the following ones.

Even though I'm sure chess matches were used in some of the LLM training, I'd bet a model trained just for chess would do far better.

replies(1): >>42142221 #

viraptor ◴[14 Nov 24 23:06 UTC] No.42142221[source]▶

>>42141758 #

> That is how they can also handle input in audio or visual forms because there is an audio or visual tokenizer.

This is incorrect. They get translated into the shared latent space, but they're not tokenized in any way resembling the text part.

replies(1): >>42142525 #

pseudosavant ◴[14 Nov 24 23:44 UTC] No.42142525[source]▶

>>42142221 #

They are almost certainly tokenized in most LLM multi-modal models. https://en.wikipedia.org/wiki/Large_language_model#Multimoda...

replies(1): >>42142583 #

1. viraptor ◴[14 Nov 24 23:51 UTC] No.42142583[source]▶

>>42142525 #

Ah, an overloaded "tokenizer" meaning. "split into tokens" vs "turned into a single embedding matching a token" I've never heard it used that way before, but it makes sense kinda.

↑

Something weird is happening with LLMs and chess