Here's a bit of a quirk: I uploaded a webcomic as an example, all the dialog was ALL CAPS, but the output was inconsistently either sentence case or title case between panels.
I also tried some real examples a problem I'd like to use OCR with: I've got some old slides that needs digitising, and most of them are labelled, uploading one of these provides the output:
The image appears to be a photograph of a slide or film frame, possibly from an old camera or projector. The slide is yellowed with age and has a rectangular cutout in the center, which is filled with a dark gray or black material. The cutout is surrounded by a thin border, and there is some text written on the slide in black ink.
The text reads "Once Upon a Time" and is written in a cursive font. It is located at the bottom of the slide, below the cutout. There is also a small number "1069" written in the same font and color, but it is not clear what this number refers to.
Overall, the image suggests that the slide is an old photograph or film frame that has been preserved for many years. The yellowing of the slide and the cursive writing suggest that it may be from the early 20th century or earlier.
So aside from unnecessary repetitious description of the slide, (and the "yellowing" is actually just white balance being off, though I can forgive that), the
actual written text (not cursive) was "Once Uniquitous." and the number was 106g. It's very clearly a 'g' and not a '9'.
What I think is interesting about this is that it might be a demonstration of biases in models, it focuses too much on the slide being an antique that it hallucinated a completely cliche title. Also, it missed the forest for the trees and that the "black square" was the slide being front-lit so the text could be read, so the transparency wasn't visible.
Additionally, the API itself seems to have file size or resolution limits that are not documented