(twitter.com)

237 points JnBrymn | 2 comments | 21 Oct 25 17:43 UTC | HN request time: 0.001s | source

https://xcancel.com/karpathy/status/1980397031542989305

Show context

sabareesh ◴[22 Oct 25 22:18 UTC] No.45675879[source]▶

It might be that our current tokenization is inefficient compared to how well image pipeline does. Language already does lot of compression but there might be even better way to represent it in latent space

replies(3): >>45675953 #>>45676049 #>>45677115 #

ACCount37 ◴[22 Oct 25 22:26 UTC] No.45675953[source]▶

>>45675879 #

People in the industry know that tokenizers suck and there's room to do better. But actually doing it better? At scale? Now that's hard.

replies(1): >>45676189 #

typpilol ◴[22 Oct 25 22:55 UTC] No.45676189[source]▶

>>45675953 #

It will require like 20x the compute

replies(3): >>45676906 #>>45676935 #>>45676964 #

1. ACCount37 ◴[23 Oct 25 00:42 UTC] No.45676935[source]▶

>>45676189 #

A lot of cool things are shot down by "it requires more compute, and by a lot, and we're already compute starved on any day of the week that ends in y, so, not worth it".

If we had a million times the compute? We might have brute forced our way to AGI by now.

replies(1): >>45676998 #

2. Jensson ◴[23 Oct 25 00:52 UTC] No.45676998[source]▶

>>45676935 (TP) #

But we don't have a million times the compute, we have the compute we have so its fair to argue that we want to prioritize other things.

↑

Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?