←back to thread

293 points lapnect | 2 comments | | HN request time: 0.435s | source
Show context
Eisenstein ◴[] No.42154707[source]
All it does is send the image to Llama 3.2 Vision and ask for it to read the text.

Note that this is just as open to hallucination as any other LLM output, because what it is doing is not reading the pixels looking for text characters, but describing the picture, which uses the images it trained on and their captions to determine what the text is. It may completely make up words, especially if it can't read them.

replies(1): >>42154755 #
M4v3R ◴[] No.42154755[source]
This is also true for any other OCR system, we just never called these errors “hallucinations” in this context.
replies(4): >>42154787 #>>42154980 #>>42155011 #>>42155143 #
noduerme ◴[] No.42155143[source]
No, it's not even close to OCR systems, which are based on analyzing points in a grid for each character stroke and comparing them with known characters. Just for one thing, OCR systems are deterministic. Deterministic. Look it up.
replies(2): >>42155209 #>>42155470 #
1. visarga ◴[] No.42155209[source]
OCR system use vision models and as such they can make mistakes. They don't sample but they produce a distribution of probability over words like LLMs.
replies(1): >>42155496 #
2. ◴[] No.42155496[source]