(twitter.com)

233 points JnBrymn | 1 comments | 21 Oct 25 17:43 UTC | HN request time: 0s | source

https://xcancel.com/karpathy/status/1980397031542989305

Show context

varispeed ◴[22 Oct 25 22:46 UTC] No.45676118[source]▶

Text is linear, whereas image is parallel. I mean when people often read they don't scan text from left to right (or different direction, depending on language), but rather read the text all at once or non-linearly. Like first lock on keywords and then read adjacent words to get meaning, often even skipping some filler sentences unconsciously.

Sequential reading of text is very inefficient.

replies(4): >>45676232 #>>45676919 #>>45677443 #>>45677649 #

sosodev ◴[22 Oct 25 23:00 UTC] No.45676232[source]▶

>>45676118 #

LLMs don't "read" text sequentially, right?

replies(1): >>45676349 #

olliepro ◴[22 Oct 25 23:14 UTC] No.45676349[source]▶

>>45676232 #

The causal masking means future tokens don’t affect previous tokens embeddings as they evolve throughout the model, but all tokens a processed in parallel… so, yes and no. See this previous HN post (https://news.ycombinator.com/item?id=45644328) about how bidirectional encoders are similar to diffusion’s non-linear way of generating text. Vision transformers use bidirectional encoding b/c of the non-causal nature of image pixels.

replies(1): >>45676819 #

Merik ◴[23 Oct 25 00:22 UTC] No.45676819{3}[source]▶

>>45676349 #

Didn’t anthropic show that the models engage in a form of planning such that it is predicting a possible future subsequent tokens that then affects prediction of the next token: https://transformer-circuits.pub/2025/attribution-graphs/bio...

replies(1): >>45677066 #

1. ACCount37 ◴[23 Oct 25 01:06 UTC] No.45677066{4}[source]▶

>>45676819 #

Sure, an LLM can start "preparing" for token N+4 at token N. But that doesn't change that the token N can't "see" N+1.

Causality is enforced in LLMs - past tokens can affect future tokens, but not the other way around.

↑

Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?