←back to thread

293 points lapnect | 1 comments | | HN request time: 0s | source
Show context
Eisenstein ◴[] No.42154707[source]
All it does is send the image to Llama 3.2 Vision and ask for it to read the text.

Note that this is just as open to hallucination as any other LLM output, because what it is doing is not reading the pixels looking for text characters, but describing the picture, which uses the images it trained on and their captions to determine what the text is. It may completely make up words, especially if it can't read them.

replies(1): >>42154755 #
M4v3R ◴[] No.42154755[source]
This is also true for any other OCR system, we just never called these errors “hallucinations” in this context.
replies(4): >>42154787 #>>42154980 #>>42155011 #>>42155143 #
1. geysersam ◴[] No.42154980[source]
I gave this tool a picture of a restaurant menu and it made up several additional entries that didn't exist in the picture... What other OCR system would do that?