"Vision is an especially large weakness."
But you can have GPT write code to reliably convert the image grid into a textual representation, right? And code to convert back to image and auto-verify.
 replies(2): 
But you can have GPT write code to reliably convert the image grid into a textual representation, right? And code to convert back to image and auto-verify.