Llama-OCR: Document to Markdown

1. mg ◴[16 Nov 24 07:59 UTC] No.42155156[source]▶

I gave it a sentence, which I created by placing 500 circles via a genetic algorithm to form a sentence. And then drew with an actual physical circle:

https://www.instagram.com/marekgibney/p/BiFNyYBhvGr/

Interestingly, it sees the circles just fine, but not the sentence. It replied with this:

    The image contains no text or other elements
    that can be represented in Markdown. It is a
    visual composition of circles and does not
    convey any information that can be translated
    into Markdown format.

replies(5): >>42155181 #>>42155186 #>>42155206 #>>42155424 #>>42156784 #

2. echoangle ◴[16 Nov 24 08:06 UTC] No.42155181[source]▶

>>42155156 (TP) #

I can’t read anything but the „stop“ either without seeing the solution first

3. DandyDev ◴[16 Nov 24 08:07 UTC] No.42155186[source]▶

>>42155156 (TP) #

I can't read this either.

Edit: at a distance it's easier to read

replies(1): >>42155287 #

4. wasyl ◴[16 Nov 24 08:11 UTC] No.42155206[source]▶

>>42155156 (TP) #

Why is it interesting? The image does not look like anything, and you need to skew it (by looking at an angle) to see any letters (barely).

5. thih9 ◴[16 Nov 24 08:36 UTC] No.42155287[source]▶

>>42155186 #

If you squint it’s easier too. I wonder if lowering the resolution of the image would make the text visible to ocr.

replies(1): >>42156818 #

6. Vetch ◴[16 Nov 24 09:13 UTC] No.42155424[source]▶

>>42155156 (TP) #

Based on the fact that squinting works, I applied a Gaussian blur to the image. Here's the response I got:

Markdown:

The provided image is a blurred text that reads "STOP THINKING IN CIRCLES." There are no other visible elements such as headers, footers, subtexts, images, or tables.

Markdown Content:

STOP THINKING IN CIRCLES

As the response is not deterministic, I also tried several times with the unprocessed image but it never worked. However, all the low-pass filter effects I applied worked with a high success rate.

https://imgur.com/q7Zd7fa

replies(1): >>42155596 #

7. mg ◴[16 Nov 24 10:03 UTC] No.42155596[source]▶

>>42155424 #

I guess blurring it is similar to reducing the resolution or to looking at the image from further away.

It's interesting that the neural net figures out the circles, but not the words. Because the circles are also not so easily apparent from looking closely at the image. It could also be whirly lines.

8. ggerules ◴[16 Nov 24 15:05 UTC] No.42156784[source]▶

>>42155156 (TP) #

Was the original LLM ever trained on original material like this?

Pretty cool use of genetic algorithm! Would love to see the code or at least the reward function.

9. pbhjpbhj ◴[16 Nov 24 15:09 UTC] No.42156818{3}[source]▶

>>42155287 #

I wonder if you could do a composite image, like bracketed images, and so give the model multiple goes, for which it could amalgamate results. So, you could do an exposure bracket, do a focus/blur, maybe a stretch/compression, or an adjustment for font-height as a proportion of the image.

Feed all of the alternatives to the model, tell it they each have the same textual content?