←back to thread

233 points JnBrymn | 3 comments | | HN request time: 0.001s | source
Show context
varispeed ◴[] No.45676118[source]
Text is linear, whereas image is parallel. I mean when people often read they don't scan text from left to right (or different direction, depending on language), but rather read the text all at once or non-linearly. Like first lock on keywords and then read adjacent words to get meaning, often even skipping some filler sentences unconsciously.

Sequential reading of text is very inefficient.

replies(4): >>45676232 #>>45676919 #>>45677443 #>>45677649 #
spiralcoaster ◴[] No.45676919[source]
What people do you know that do this? I absolutely read in a linear fashion unless I'm deliberately skimming something to get the gist of it. Who can read the text "all at once"?!
replies(2): >>45677117 #>>45677476 #
1. numpad0 ◴[] No.45677117[source]
I don't know how common it is, but I tend to read novels in a buttered heterogeneous multithreading mode - image and logical and emotional readings all go at each their own paces, rather than a singular OCR engine feeding them all with 1D text

is that crazy? I'm not buying it is

replies(2): >>45677217 #>>45677762 #
2. bigbluedots ◴[] No.45677217[source]
Don't know, probably? I'm a linear reader
3. alwa ◴[] No.45677762[source]
That description feels relatable to me. Maybe buffered more than buttered, in my case ;)

It seems to me that would be a tick in the “pro” column for this idea of using pixels (or contours, a la JPEG) as the models’ fundamental stimulus to train against (as opposed to textual tokens). Isn’t there a comparison to be drawn between the “threads” you describe here, and the multi-headed attention mechanisms (or whatever it is) that the LLM models use to weigh associations at various distances between tokens?