←back to thread

293 points lapnect | 1 comments | | HN request time: 0.203s | source
Show context
mg ◴[] No.42155156[source]
I gave it a sentence, which I created by placing 500 circles via a genetic algorithm to form a sentence. And then drew with an actual physical circle:

https://www.instagram.com/marekgibney/p/BiFNyYBhvGr/

Interestingly, it sees the circles just fine, but not the sentence. It replied with this:

    The image contains no text or other elements
    that can be represented in Markdown. It is a
    visual composition of circles and does not
    convey any information that can be translated
    into Markdown format.
replies(5): >>42155181 #>>42155186 #>>42155206 #>>42155424 #>>42156784 #
DandyDev ◴[] No.42155186[source]
I can't read this either.

Edit: at a distance it's easier to read

replies(1): >>42155287 #
thih9 ◴[] No.42155287[source]
If you squint it’s easier too. I wonder if lowering the resolution of the image would make the text visible to ocr.
replies(1): >>42156818 #
1. pbhjpbhj ◴[] No.42156818[source]
I wonder if you could do a composite image, like bracketed images, and so give the model multiple goes, for which it could amalgamate results. So, you could do an exposure bracket, do a focus/blur, maybe a stretch/compression, or an adjustment for font-height as a proportion of the image.

Feed all of the alternatives to the model, tell it they each have the same textual content?