←back to thread

125 points robin_reala | 6 comments | | HN request time: 0.001s | source | bottom
Show context
simonw ◴[] No.46203241[source]
Something I'm desperately keen to see is AI-assisted accessibility testing.

I'm not convinced at all by most of the heuristic-driven ARIA scanning tools. I don't want to know if my app appears to have the right ARIA attributes set - I want to know if my features work for screenreader users.

What I really want is for a Claude Code style agent to be able to drive my application in an automated fashion via a screenreader and record audio for me of successful or failed attempts to achieve goals.

Think Playwright browser tests but for popular screenreaders instead.

Every now and then I check to see if this is a solved problem yet.

I think we are close. https://www.guidepup.dev/ looks extremely promising - though I think it only supports VoiceOver on macOS or NVDA on Windows, which is a shame since asynchronous coding agent tools like Codex CLI and Claude Code for web only run Linux.

What I haven't seen yet is someone closing the loop on ensuring agentic tools like Claude Code can successfully drive these mechanisms.

replies(12): >>46203277 #>>46203374 #>>46203420 #>>46203447 #>>46203583 #>>46203605 #>>46203642 #>>46204338 #>>46204455 #>>46206651 #>>46206832 #>>46208023 #
devinprater ◴[] No.46203583[source]
There are thousands of blind people on the net. Can't you hire one of them to test for you? Please?
replies(6): >>46203654 #>>46203668 #>>46204073 #>>46204737 #>>46205153 #>>46205158 #
m12k ◴[] No.46203668[source]
If you don't want this to break eventually, you need it tested every time your CI/CD test suite runs. Manual testing just doesn't cut it
replies(2): >>46203955 #>>46204939 #
1. cenamus ◴[] No.46203955[source]
AI in your CI pipeline won't help either then, if it randomly gives different answers
replies(2): >>46204019 #>>46204108 #
2. simonw ◴[] No.46204019[source]
An AI-generated automated testing script in your pipeline will do great though.
replies(1): >>46204169 #
3. zamadatix ◴[] No.46204108[source]
So does hiring a person or tests which rely on entropy because exhaustive testing is infeasible. If you can wrangle the randomness (each has different ways of going about that) then you end up with very useful tests in all 3 scenarios, but only automated tests scale to running every commit. You probably still want the non-automated tests per release or something as well if you can, depending what you're doing, but you don't necessarily want only invariant tests in either case.
4. debugnik ◴[] No.46204169[source]
And then we're back at your own:

> I'm not convinced at all by most of the heuristic-driven ARIA scanning tools.

replies(1): >>46204262 #
5. simonw ◴[] No.46204262{3}[source]
That's entirely different.

ARIA scanning tools are things that throw an error if they see an element that's missing an attribute, without even attempting to invoke a real screenreader.

I'm arguing for automated testing scripts that use tools like Guidepup to launch a real screenreader and assert things like the new content that was added by fetch() being read out to the user after the form submission has completed.

I want LLMs and coding agents to help me write those scripts, so I can run them in CI along with the rest of my automated tests.

replies(1): >>46205738 #
6. debugnik ◴[] No.46205738{4}[source]
That's very different from what I thought you were arguing for in your top comment, though: a computer-use agent proving the app is usable through a screen reader alone (and hopefully caching a replayable trace to not prompt it on every run).

Guidepup already exists, if people cared they'd use it for tests with or without LLMs. Thanks for showing me this tool BTW! I agree testing against real readers is better than using a third-party's heuristics.