←back to thread

549 points thecr0w | 1 comments | | HN request time: 0.206s | source
Show context
Wowfunhappy ◴[] No.46183598[source]
Claude is not very good at using screenshots. The model may technically be multi-modal, but its strength is clearly in reading text. I'm not surprised it failed here.
replies(3): >>46184084 #>>46184296 #>>46186300 #
dcanelhas ◴[] No.46184296[source]
Even with text, parsing content in 2D seems to be a challenge for every LLM I have interacted with. Try getting a chatbot to make an ascii-art circle with a specific radius and you'll see what I mean.
replies(1): >>46185268 #
1. Wowfunhappy ◴[] No.46185268[source]
I don't really consider ASCII art to be text. It requires a completely different type of reasoning. A blind person can be understand text if it's read out loud. A blind person really can't understand ASCII art if it's read out loud.