←back to thread

549 points thecr0w | 1 comments | | HN request time: 0s | source
Show context
buchwald ◴[] No.46186807[source]
Claude is surprisingly bad at visual understanding. I did a similar thing to OP where I wanted Claude to visually iterate on Storybook components. I found outsourcing the visual check to Playwright in vision mode (as opposed to using the default a11y tree) and Codex for understanding worked best. But overall the idea of a visual inspection loop went nowhere. I blogged about it here: https://solbach.xyz/ai-agent-accessibility-browser-use/
replies(1): >>46187766 #
1. MagMueller ◴[] No.46187766[source]
Interesting read. Agree that GUI is super hard for agents. Did you see "skills" from browser-use? We directly interact with network requests now.