(twitter.com)

233 points JnBrymn | 1 comments | 21 Oct 25 17:43 UTC | HN request time: 0.204s | source

https://xcancel.com/karpathy/status/1980397031542989305

1. hiddencost ◴[23 Oct 25 02:48 UTC] No.45677655[source]▶

Back before transformers, or even LSTMs, we used to joke that image recognition was so far ahead of language modeling that we should just convert our text to PDF and run the pixels through a CNN.

↑

Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?