(llamaocr.com)

293 points lapnect | 1 comments | 16 Nov 24 04:57 UTC | HN request time: 0s | source

Show context

mg ◴[16 Nov 24 07:59 UTC] No.42155156[source]▶

I gave it a sentence, which I created by placing 500 circles via a genetic algorithm to form a sentence. And then drew with an actual physical circle:

https://www.instagram.com/marekgibney/p/BiFNyYBhvGr/

Interestingly, it sees the circles just fine, but not the sentence. It replied with this:

    The image contains no text or other elements
    that can be represented in Markdown. It is a
    visual composition of circles and does not
    convey any information that can be translated
    into Markdown format.

replies(5): >>42155181 #>>42155186 #>>42155206 #>>42155424 #>>42156784 #

DandyDev ◴[16 Nov 24 08:07 UTC] No.42155186[source]▶

>>42155156 #

I can't read this either.

Edit: at a distance it's easier to read

replies(1): >>42155287 #

thih9 ◴[16 Nov 24 08:36 UTC] No.42155287[source]▶

>>42155186 #

If you squint it’s easier too. I wonder if lowering the resolution of the image would make the text visible to ocr.

replies(1): >>42156818 #

1. pbhjpbhj ◴[16 Nov 24 15:09 UTC] No.42156818[source]▶

>>42155287 #

I wonder if you could do a composite image, like bracketed images, and so give the model multiple goes, for which it could amalgamate results. So, you could do an exposure bracket, do a focus/blur, maybe a stretch/compression, or an adjustment for font-height as a proportion of the image.

Feed all of the alternatives to the model, tell it they each have the same textual content?

↑

Llama-OCR: Document to Markdown