(llamaocr.com)

293 points lapnect | 1 comments | 16 Nov 24 04:57 UTC | HN request time: 0s | source

Show context

Eisenstein ◴[16 Nov 24 06:03 UTC] No.42154707[source]▶

All it does is send the image to Llama 3.2 Vision and ask for it to read the text.

Note that this is just as open to hallucination as any other LLM output, because what it is doing is not reading the pixels looking for text characters, but describing the picture, which uses the images it trained on and their captions to determine what the text is. It may completely make up words, especially if it can't read them.

replies(1): >>42154755 #

M4v3R ◴[16 Nov 24 06:12 UTC] No.42154755[source]▶

>>42154707 #

This is also true for any other OCR system, we just never called these errors “hallucinations” in this context.

replies(4): >>42154787 #>>42154980 #>>42155011 #>>42155143 #

1. llm_trw ◴[16 Nov 24 06:20 UTC] No.42154787[source]▶

>>42154755 #

It really isn't since those systems are character based.

↑

Llama-OCR: Document to Markdown