←back to thread

125 points robin_reala | 2 comments | | HN request time: 0.001s | source
Show context
simonw ◴[] No.46203241[source]
Something I'm desperately keen to see is AI-assisted accessibility testing.

I'm not convinced at all by most of the heuristic-driven ARIA scanning tools. I don't want to know if my app appears to have the right ARIA attributes set - I want to know if my features work for screenreader users.

What I really want is for a Claude Code style agent to be able to drive my application in an automated fashion via a screenreader and record audio for me of successful or failed attempts to achieve goals.

Think Playwright browser tests but for popular screenreaders instead.

Every now and then I check to see if this is a solved problem yet.

I think we are close. https://www.guidepup.dev/ looks extremely promising - though I think it only supports VoiceOver on macOS or NVDA on Windows, which is a shame since asynchronous coding agent tools like Codex CLI and Claude Code for web only run Linux.

What I haven't seen yet is someone closing the loop on ensuring agentic tools like Claude Code can successfully drive these mechanisms.

replies(12): >>46203277 #>>46203374 #>>46203420 #>>46203447 #>>46203583 #>>46203605 #>>46203642 #>>46204338 #>>46204455 #>>46206651 #>>46206832 #>>46208023 #
api ◴[] No.46204338[source]
What about just AI assisted accessibility? Like stop requiring apps to do anything at all. The AI visually parses the app UI for the user, explains it, and interacts.

Accessible is an also-have at best for the vast majority of software. This would open a lot more software to blind users than is currently available.

replies(1): >>46204357 #
1. simonw ◴[] No.46204357[source]
That's expensive, slow (listen to a screenreader user some time to see how quickly they operate) and likely only works online.

I'm also not going to shirk my responsibilities as a developer based on a hope that the assistive tech will improve.

replies(1): >>46204439 #
2. api ◴[] No.46204439[source]
It’s expensive for now, slow for now, online for now, but it’s pretty clear that this is the future. If I were blind I’d want it to go here since it would just unlock so much more. Very little software or sites have good accessibility. Open source and indie stuff often has none.

A custom local model trained only for this task seems like a possibility, and could be way smaller than some general purpose model being instructed for this task. I’m thinking screen reader and UI assist only. Could probably be like a 7B quantized model. Maybe smaller.