←back to thread

Delete tests

(andre.arko.net)
125 points mooreds | 3 comments | | HN request time: 0.706s | source
Show context
jampa ◴[] No.45071974[source]
I work in an app where bugs are unacceptable due to the nature of the company's reputation. We've been having a lot of success with E2E, but getting there was NOT easy. Some tips:

- False negative results will make your devs hate the tests. People want to get things done and will start ignoring them if you unnecessarily break their workflow. In the CI, you should always retry on failure to avoid flaky false-negative tests.

- E2E Tests can fail suddenly. To avoid breaking people's workflow, we do a megabenchmark every day at 1 AM, and the test runs multiple times - even if it passes - so that we can measure flakiness. If a test fails in the benchmark, we remove it from the CI so we don't break other developers' workflows. The next day, we either fix the test or the bug.

- Claude Code SDK has been a blessing for E2E. Before, you couldn't run all the E2E in the PR's CI due to the time they all take. Now, we can send the branch to the Claude Code SDK to determine what E2E tests should run.

- Also, MCPs and Claude Code now write most of my E2E. I wrote a detailed Claude.md to let it run autonomously --writing, validating, and repeating -- while I do something else. It does in 3 to 4 shots. For the price of a cup of coffee, it saves me 30-60 minutes per test.

replies(1): >>45074792 #
1. _caw ◴[] No.45074792[source]
Would love to hear more about using Claude to determine which E2E tests to run. What context are you giving it?

Is it like, "this looks like a billing feature, let me run any tests that seem relevant"?

replies(1): >>45079927 #
2. jampa ◴[] No.45079927[source]
I feed the `git diff` of the branch (excluding large files like package-lock) and a list of the E2E files. Claude reads and compares the E2E tests against the modified content.

The good thing about Claude Code is that it uses tool calls to explore the files to check which E2E tests can validate the changes.

replies(1): >>45080937 #
3. anon7000 ◴[] No.45080937[source]
How well does this work for massive codebases and thousands of e2e tests?