(llamaocr.com)

293 points lapnect | 2 comments | 16 Nov 24 04:57 UTC | HN request time: 0.469s | source

Show context

Eisenstein ◴[16 Nov 24 06:03 UTC] No.42154707[source]▶

All it does is send the image to Llama 3.2 Vision and ask for it to read the text.

Note that this is just as open to hallucination as any other LLM output, because what it is doing is not reading the pixels looking for text characters, but describing the picture, which uses the images it trained on and their captions to determine what the text is. It may completely make up words, especially if it can't read them.

replies(1): >>42154755 #

M4v3R ◴[16 Nov 24 06:12 UTC] No.42154755[source]▶

>>42154707 #

This is also true for any other OCR system, we just never called these errors “hallucinations” in this context.

replies(4): >>42154787 #>>42154980 #>>42155011 #>>42155143 #

noduerme ◴[16 Nov 24 07:55 UTC] No.42155143[source]▶

>>42154755 #

No, it's not even close to OCR systems, which are based on analyzing points in a grid for each character stroke and comparing them with known characters. Just for one thing, OCR systems are deterministic. Deterministic. Look it up.

replies(2): >>42155209 #>>42155470 #

1. visarga ◴[16 Nov 24 08:12 UTC] No.42155209[source]▶

>>42155143 #

OCR system use vision models and as such they can make mistakes. They don't sample but they produce a distribution of probability over words like LLMs.

replies(1): >>42155496 #

2. ◴[16 Nov 24 09:32 UTC] No.42155496[source]▶

>>42155209 (TP) #

↑

Llama-OCR: Document to Markdown