Most active commenters

troupo(11)
yakshaving_jgt(10)
MrJohz(9)
strogonoff(7)
imiric(6)
9rx(6)
s_ting765(5)
simianwords(4)
EliRivers(4)
RHSeeger(4)

Popular/hot comments

>>45071410 #
>>45072079 #
>>45071726 #
>>45071535 #
>>45071880 #
>>45071984 #
>>45071144 #
>>45072176 #
>>45072276 #
>>45072321 #
>>45072431 #
>>45072478 #
>>45072649 #

Delete tests

(andre.arko.net)

1. avg_dev ◴[30 Aug 25 01:28 UTC] No.45071144[source]▶

>>45038074 (OP) #

idk, i never thought

> “it is blasphemy to delete a test”,

was ever a thing. i still don't.

if a test is flaky, but it covers something useful, it should be made not flaky somehow. if tests are slow, but they are useful, then they should be optimized so they run faster. if tests cover some bit of functionality that the software no longer needs to provide, the functionality should be deleted from the code and the tests. if updating a small bit of code causes many tests to need to be adjusted, and that's a pain, and it happens frequently, then the tests should be refactored or adjusted.

> Confidence is the point of writing tests.

yes, agreed. but tests are code, too. just maintain the tests with the code in a sensible way. if there is something worth deleting, delete it; there is no gospel that says you can't. but tests provide value just like the author describes in the "fix after revert after fix after revert" counterexample. just remember they're code like anything else is all and treat them accordingly.

replies(3): >>45071374 #>>45071403 #>>45071745 #

2. skybrian ◴[30 Aug 25 02:18 UTC] No.45071374[source]▶

>>45071144 #

I think the argument is that sometimes updating a flaky test is not worth the effort, so consider deleting it,

3. rcktmrtn ◴[30 Aug 25 02:24 UTC] No.45071403[source]▶

>>45071144 #

> > “it is blasphemy to delete a test”,

> was ever a thing. i still don't.

I experienced this when working at a giant company where all the teams were required to report their "code coverage" metrics to middle management.

We had the flaky test problem too, but I think another angle of is being shackled to test tech-debt. The "coverage goals" in practice encouraged writing a lot of low quality tests with questionable and complex fixtures (using regular expressions to yoink C++ functions/variables out of their modules and place them into test fixtures).

Fiddling with tests slowed down a lot of things, but there was a general agreement that the whole projected needed to be re-architected (it was split up over a zillion different little "libraries" that pretended to be independent, but were actually highly interdependent) and while I was there I always felt like we needed to cut the Gordian knot and accept that it might decrease the sacred code coverage.

Not sure if I was right or what ever happened with that project but it sure was a learning experience.

4. recursivedoubts ◴[30 Aug 25 02:27 UTC] No.45071410[source]▶

>>45038074 (OP) #

One of the most important things you can do is move your tests up the abstraction layers and away from unit tests. For lack of a better term, to move to integration tests. End-to-end tests are often too far from the system to easily understand what's wrong when they break, and can overwhelm a development org. Integration tests (or whatever you want to call them) are often the sweet spot: not tied to a particular implementation, able to survive fairly significant system changes, but also easy enough to debug when they break.

https://grugbrain.dev/#grug-on-testing

replies(11): >>45071535 #>>45071726 #>>45071751 #>>45071944 #>>45072117 #>>45072123 #>>45072158 #>>45072321 #>>45072494 #>>45074365 #>>45080184 #

5. MathMonkeyMan ◴[30 Aug 25 02:53 UTC] No.45071535[source]▶

>>45071410 #

Integration tests at $DAY_JOB are often slow (sleeps, retries, inadequate synchronization, startup and shut down 8 processes that are slow to start and stop), flaky (the metrics for this rate limiter should be within 5%, this should be true within 3 seconds, the output of this shell command is the same on all platforms), undocumented, and sometimes cannot be run locally or with locally available configurations. When I run a set of integration tests associated with some code I'm modifying, I have no idea what they are, why they were written, what they do, how long they will take to run, or whether I should take failures seriously.

Integration tests are closer to what you want to know, but they're also more. If I want to make sure that my state machine returns an error when it receives a message for which no state transition is defined, I could spin up a process and set up log collection and orchestrate with python and... or I could write a unit test that instantiates a state machine, gives it a message, and checks the result.

My point is that we need both. Write a unit test to ensure that your component behaves to its spec, especially with respect to edge cases. Write an integration test to make sure that the feature of which your component is a part behaves as expected.

replies(4): >>45071641 #>>45071866 #>>45072408 #>>45073609 #

6. skydhash ◴[30 Aug 25 03:21 UTC] No.45071641{3}[source]▶

>>45071535 #

My current mental model is a car. If it’s a function or some other things you’re fully confident you captured the domain, add unit tests to capture that. Just like an engine. But the most important is integration tests that couple something like the engine, the ignition system and test that when the user press the start button, the engine start and the dashboard light up.

Unit tests are great for DX but only integration and above tests matter business wise.

7. gijoeyguerra ◴[30 Aug 25 03:40 UTC] No.45071721[source]▶

>>45038074 (OP) #

I've always deleted tests. I've never heard anyone say not to delete tests.

replies(2): >>45071803 #>>45071841 #

8. RHSeeger ◴[30 Aug 25 03:42 UTC] No.45071726[source]▶

>>45071410 #

Integration tests and Unit tests are different tools; and each has their place and purpose. Using one "instead" of the other is a mistake.

replies(7): >>45072079 #>>45072176 #>>45072722 #>>45072873 #>>45073135 #>>45074394 #>>45080460 #

9. arkis22 ◴[30 Aug 25 03:45 UTC] No.45071745[source]▶

>>45071144 #

Adding a test is easy. Deleting a test should involve like 3-4 people who all know the codebase.

10. yeswecatan ◴[30 Aug 25 03:46 UTC] No.45071751[source]▶

>>45071410 #

I find testing terminology very confusing and inconsistent. Personally, I prefer tests that cover multiple components. Is that an integration test because you test multiple components? What if you system is designed in such a way that these tests are _fast_ because the data access is abstracted away and you can use in memory repositories instead of hitting the database?

replies(2): >>45072013 #>>45072858 #

11. efitz ◴[30 Aug 25 03:56 UTC] No.45071799[source]▶

>>45038074 (OP) #

I have had a weird thought lately about testing at runtime. My thought is just to log violations of expectations- i.e. log when the test would have failed.

This doesn’t prevent the bug from being introduced but can remove a huge amount of complexity for test cases that are hard to set up.

replies(2): >>45072016 #>>45072363 #

12. fritzo ◴[30 Aug 25 03:56 UTC] No.45071803[source]▶

>>45071721 #

I repeatedly, emphatically tell AI coding assistants not to delete tests.

13. readthenotes1 ◴[30 Aug 25 04:05 UTC] No.45071841[source]▶

>>45071721 #

I groaned when a co-worker deleted a test that was pointing out his code was broken.

I didn't tell him not to delete tests. It wouldn't have done any good.

replies(1): >>45072399 #

14. majormajor ◴[30 Aug 25 04:13 UTC] No.45071866{3}[source]▶

>>45071535 #

You need to test contracts with external code without having to include full external systems. Unit tests on internal implementation details are fragile as behavior changes. Unit tests on your module's contracts give you confidence in refactoring.

Passing params in instead of making external calls inside your business logic functions can help. DI can help if that's too impractical or unwieldy for whatever reason in the domain.

It's hard to do right the first time - sometimes its fuzzy what's an internal detail vs what's an external contract - but you need to get there ASAP.

15. dcminter ◴[30 Aug 25 04:15 UTC] No.45071880[source]▶

>>45038074 (OP) #

How about you fix the flakey tests?

The tests I'd delete are the ones that just test that the code is written in a particular way instead of testing the expected behaviour of thr code.

replies(3): >>45071959 #>>45072007 #>>45072191 #

16. bubblebeard ◴[30 Aug 25 04:25 UTC] No.45071917[source]▶

>>45038074 (OP) #

The author has a point. Obsolete tests serves no one, but deleting a test because it will randomly fail is an indication of an unstable process. Maybe there is a race condition, maybe your code has some dependency that is sporadically unavailable. Deleting such tests is just turning a blind eye to the problem. Unstable tests means you either didn’t write that test very well to begin with, or the process you are testing is itself unstable.

17. ashishb ◴[30 Aug 25 04:35 UTC] No.45071944[source]▶

>>45071410 #

What you are asking for is to write tests along the axis of least change

https://ashishb.net/programming/bad-and-good-ways-to-write-a...

18. XorNot ◴[30 Aug 25 04:42 UTC] No.45071959[source]▶

>>45071880 #

This: anything which starts doing stuff like "called API N times" is utterly worthless (looking at you whole bunch of AWS API mock tests...)

19. jampa ◴[30 Aug 25 04:47 UTC] No.45071974[source]▶

>>45038074 (OP) #

I work in an app where bugs are unacceptable due to the nature of the company's reputation. We've been having a lot of success with E2E, but getting there was NOT easy. Some tips:

- False negative results will make your devs hate the tests. People want to get things done and will start ignoring them if you unnecessarily break their workflow. In the CI, you should always retry on failure to avoid flaky false-negative tests.

- E2E Tests can fail suddenly. To avoid breaking people's workflow, we do a megabenchmark every day at 1 AM, and the test runs multiple times - even if it passes - so that we can measure flakiness. If a test fails in the benchmark, we remove it from the CI so we don't break other developers' workflows. The next day, we either fix the test or the bug.

- Claude Code SDK has been a blessing for E2E. Before, you couldn't run all the E2E in the PR's CI due to the time they all take. Now, we can send the branch to the Claude Code SDK to determine what E2E tests should run.

- Also, MCPs and Claude Code now write most of my E2E. I wrote a detailed Claude.md to let it run autonomously --writing, validating, and repeating -- while I do something else. It does in 3 to 4 shots. For the price of a cup of coffee, it saves me 30-60 minutes per test.

replies(1): >>45074792 #

20. jhhh ◴[30 Aug 25 04:50 UTC] No.45071984[source]▶

>>45038074 (OP) #

If you are having to refactor 150 things each time you change your codebase then maybe you need to refactor your test suite first. Direct calls in tests to constructed/mocked objects is usually something you can just stuff into a private method so you only need to change it in one place.

Not quite sure I agree with the conclusion of the tiers of testing section either. If a test suite takes a long time but still covers something useful, then just deleting it because it takes too long makes no sense. Yes, if you have a 'fastTests' profile that doesn't run it that could temporarily convince you your changes are fine when they aren't. But the alternative is just never knowing your change is bad until it breaks production instead of just breaking your CI prior to that point.

replies(3): >>45072161 #>>45072222 #>>45072435 #

21. silversmith ◴[30 Aug 25 04:56 UTC] No.45072007[source]▶

>>45071880 #

Came here to comment this. Most of the flakey tests are badly written, some warn you about bugs you don't yet understand.

Couple years ago I helped to bring a project back on track. They had a notoriously flakey part of test suite, turned out to be caused by a race condition. And a very puzzling case of occasional data corruption - also, turns out, caused by the same race condition.

replies(1): >>45073642 #

22. jessekv ◴[30 Aug 25 04:58 UTC] No.45072013{3}[source]▶

>>45071751 #

I think it's relative, right? That's how abstractions and interfaces work.

I can write a module with integration tests at the module level and unit tests on its functions.

I can now write an application that uses my module. From the perspective of my application, my module's integration tests look like unit tests.

My module might, for example, implicitly depend on the test suite of CPython, the C compiler, the QA at the chip fab. But I don't need to run those tests any more.

In your case you hope the in-memory database matches the production one enough that you can write fast isolated unit tests on your application logic. You can trust this works because something else unit-tested the in-memory database, and integration tested the db client against the various db backends.

23. seer ◴[30 Aug 25 04:59 UTC] No.45072016[source]▶

>>45071799 #

I’ve kinda of the opinion that if introducing tests, especially the useful integration tests is hard and complex, then it is a code smell.

Most of the times, especially while I was learning, making your code “more testable” has always involved things that should have been done in the first place, but we were lazy/didn’t know better.

Things like reducing dependencies, moving state away from the core and into the shell. Using more formal state machines etc. Once the “painful changes” were done I’ve found that it was actually beneficial in a lot of other contexts.

That given, I’ve kinda almost stopped writing unit tests - with the advent of expressive types everywhere, the job of unit tests has now been shifted to the compiler.

In one typescript project I’ve managed to set it up, the part that kept the state was statically typed (a database) making sure any data that went in and out was _exactly_ like the compiler expected.

After typing and validating all the other user / non-user inputs into the code, it ended up in a situation where “if the code compiles, it will work” and that was glorious. We had very minimal unit tests - only around actual business logic with state machines, the rest was kinda handled by the compiler and we didn’t feel the need to do it manually.

Apart from that, the integration tests had the philosophy of “don’t specify anything that the user is not seeing” so no button test ids, urls or weird expectation of the underlying code, just an explanation of “the user is on the page with this title, they see a button named this and they press it, expecting they are now in a page titled this”

The concept was taken from the capybara ruby testing library way back in the day, and the tests this produced have been incredibly resilient. Any update that changes the user experience would fail the tests (as they should) and any refactor, up to the level of changing urls or even changing the underlying libraries and frameworks, would be ignored.

24. imiric ◴[30 Aug 25 05:02 UTC] No.45072028[source]▶

>>45038074 (OP) #

I'm a big believer in the utility of tests, and I do think the author has a point. There is a time and place when a test is not useful, and should be deleted.

However...

> If the future bug occurs, fix it and write a new test that doesn’t flake. Today, delete the tests.

How is this different from simply fixing the flaky test today?

Tests are code, and can also incur technical debt. The reason the test is flaky is likely because nobody is willing to take the time to address it properly. Sometimes it requires a refactoring of the SUT to allow making the test more reliable. Sometimes the test itself is too convoluted and difficult to change. All of this is chore work, and is often underappreciated. Nobody got promoted or celebrated for fixing something that is an issue a random percentage of times. After all, how do we know for sure that it's permanently fixed? Only time will tell.

But the flaky test might still deliver confidence and be valuable when it does run successfully. So deleting it would bring more uncertainty. That doesn't seem like a fair tradeoff for removing an annoyance. The better approach would be to deal with the annoyance.

> What if your tests are written so that a one line code change means updating 150 tests?

That might be a sign that the tests are too brittle, and too "whiteboxy". So fix the tests.

That said, there are situations when a change does require updating many tests. These are usually large refactors or major business logic changes. This doesn't mean that the tests are and won't be useful. It's just a side effect of the change. Tests are code, so fix the tests.

I've often heard negativity around unit tests, from programmers who strongly believe that more utility comes from integration tests (the inverted test pyramid, etc.). One of the primary reasons is this belief that unit tests slow you down because they need to be constantly updated. This is a harmful mentality, coming from a place of laziness.

Tests are code, and require maintenance just as well. Unit tests in particular are tightly coupled to the SUT, which makes them require maintenance more frequently. There should also be more unit tests than other types, adding more maintenance burden. But none of these are reasons to not write unit tests, and codebases without them are more difficult to change, and more susceptible to regressions.

> What if your tests take so long to run that you can’t run them all between merges, and you start skipping some?

That is an organizational problem. Label your tests by category (unit, integration, E2E), and provide quick ways to run a subset of them. During development, you can run the quick tests for a sanity check, while the more expensive tests run in CI.

There's also the problem of long test suites because the tests are inefficient.

Again: *fix the tests*.

> Even worse, what if your business requirements have changed, and now you have thousands of lines of tests failing because they test the wrong thing?

That is a general maintenance task. Would you say the same because you had to update a library that depended on the previous business logic? Would you simply delete the library because it would take a lot of effort to update it?

No?

Then *fix the tests*. :)

25. MrJohz ◴[30 Aug 25 05:20 UTC] No.45072079{3}[source]▶

>>45071726 #

I've never really found this to be the case in practice. When I look at well-written unit tests and well-written integration tests, they're usually doing exactly the same sort of thing and have very similar concerns in terms of code organisation and test structure.

For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.

The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.

replies(7): >>45072229 #>>45072232 #>>45072401 #>>45072421 #>>45072764 #>>45073123 #>>45073242 #

26. 3036e4 ◴[30 Aug 25 05:32 UTC] No.45072117[source]▶

>>45071410 #

I remember reading blogs (and Testing on the Toilet) around 2010 about how Google divided tests into Small/Medium/Large, with specific definitions, rather than trying to use more vague and overloaded terminology that no one ever agreed on. Seems like they are no longer doing that? Too bad, since I think it was a clever trick to avoid having to get into pointless discussions about things like "what is a unit?". Having experienced more than one project where a unit test was uselessly defined to "have to only run a single method, everything else must be mocked" I like the idea of not having any level of tests below "small" (that is still above a level most would call "unit").

Found this long 2011 post now that goes into some detail on the background and the reasons for introducing that ("The Testing Grouplet"?): https://mike-bland.com/2011/11/01/small-medium-large.html

But I am not sure even after reading all that if the SML terminology was still used in 2011 or if they had moved on already? Can't really find any newer sources that mention it.

27. BinaryIgor ◴[30 Aug 25 05:34 UTC] No.45072123[source]▶

>>45071410 #

Was exactly about to point that out - if you mostly have integration (aka in-between tests) tests, you rarely need to refactor your tests. It's about testing mostly at the right abstraction level: https://binaryigor.com/unit-integration-e2e-contract-x-tests...

replies(1): >>45077620 #

28. ◴[30 Aug 25 05:45 UTC] No.45072158[source]▶

>>45071410 #

29. eru ◴[30 Aug 25 05:45 UTC] No.45072161[source]▶

>>45071984 #

A simple thing you can set up is to run short tests on every push to a feature branch, but run the long tests only when merging into master.

Basically, you provisionally make the merge commit, run the expensive tests against it, and iff they pass, declare the newly created commit to be the new master.

30. simianwords ◴[30 Aug 25 05:48 UTC] No.45072176{3}[source]▶

>>45071726 #

Wow I hate this dogmatism. It is indeed better to use one instead of the other. Let’s stop pretending all are equally good and we need every type of test.

Sometimes you just don’t need unit tests and it’s okay to admit it and work accordingly.

replies(3): >>45072205 #>>45072404 #>>45072431 #

31. simianwords ◴[30 Aug 25 05:51 UTC] No.45072190[source]▶

>>45038074 (OP) #

A good heuristic for a test is: how many times you are having to fix it when you change real code.

If every small change in the code base causes you to go back and fix the tests then your tests are bad. They should not get in the way so often. There should be a concept of “test maintenance overhead” that is weighted against the number of bugs it catches. You could also think of it as false positives vs true positives.

32. Shank ◴[30 Aug 25 05:51 UTC] No.45072191[source]▶

>>45071880 #

> How about you fix the flakey tests?

Often times a flakey test is not flakey because it was well-written and something else strange is failing. Often times the test reveals something about the system that is somewhat non-deterministic, but not non-deterministic in a detrimental way. When you have multiple levels of abstraction and parallelization and interdependent behavior, fixing a single test becomes a time consuming process that is difficult to work with (because it's flakey, you can't always replicate the failure).

If a test fails in CI and the traceback is unclear, many people will re-run once and let it continue to flake. Obvious flakes around time and other dependencies are much easier to spot and fix, so they are. It's only the weird ones that lead to pain and regret.

replies(2): >>45073163 #>>45077959 #

33. RHSeeger ◴[30 Aug 25 05:54 UTC] No.45072205{4}[source]▶

>>45072176 #

And sometimes you only need screws, instead of nails; or vice versa. But that doesn't invalidate the tool; it just means your use case doesn't need it.

replies(1): >>45072446 #

34. strogonoff ◴[30 Aug 25 05:57 UTC] No.45072222[source]▶

>>45071984 #

Tests are code. Code has bugs. More complex code has more bugs. The more complex your tests, the more bugs in your tests. Who tests the tests? It’s one thing if you rely on functionality provided by a stable testing framework, but I bet grug no like call stacks in own test code.

replies(2): >>45072327 #>>45073276 #

35. RHSeeger ◴[30 Aug 25 05:58 UTC] No.45072229{4}[source]▶

>>45072079 #

I'll go with a bank account, because that was one of the initial examples for automated testing.

I would write integration/system (different, but similar, imo) to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.

I would write integration tests for smaller, targeted thing. Like making sure the sort method works in various cases, etc. Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.

replies(2): >>45072276 #>>45074474 #

36. rkomorn ◴[30 Aug 25 05:58 UTC] No.45072232{4}[source]▶

>>45072079 #

I've found that well-written unit tests help me narrow down problems faster during development (eg one unit test failing for a function would show that a change or refactor missed an edge case).

I've found that well-written integration tests help me catch workflow-level issues (eg something changed in a dependency that might be mocked in unit tests).

So while I think good integration tests are the best way to make sure things should ship, I see a lot of value in good unit tests for day-to-day velocity, particularly in code that's being maintained or updated instead of new code.

37. throwmeaway222 ◴[30 Aug 25 06:01 UTC] No.45072247[source]▶

>>45038074 (OP) #

delete all mocked tests imo

  mock exists
  call exists
  assert exists was called one time

so so so useless so that you can increase your coverage. just move to integration tests

38. 9rx ◴[30 Aug 25 06:08 UTC] No.45072276{5}[source]▶

>>45072229 #

> to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.

This is what unit testing was originally described as. Which confirms my belief that unit testing and integration testing has always been the very same thing.

> Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.

Perhaps unit testing has come to mean this, but these kinds of tests are rarely ever worth writing, so it is questionable if it even needs a name. Sometimes it can be helpful to isolate a function like that for the sake of pinning down complex logic or edge cases, but is likely you'll want to delete this kind of test once you're done. This is where testing brittleness is born.

replies(3): >>45072352 #>>45072380 #>>45074637 #

39. strogonoff ◴[30 Aug 25 06:16 UTC] No.45072321[source]▶

>>45071410 #

If there is one single test-related thing you must have, that would be e2e testing.

Integration tests are, in a way, worst of both worlds: they are more complicated than unit tests, they require involved setup, and yet they can’t really guarantee that things work in production.

End-to-end tests, meanwhile, do show whether things work or not. If something fails with an error, error reporting should be good enough in the first place to show you what exactly is wrong. If something failed without an error but you know it failed, make it fail with an error first by writing another test case. If there was an error but error reporting somehow doesn’t capture it, you have a bigger problem than tests.

At the end of the day, you want certainty that you deliver working software. If it’s too difficult to identify the failure, improve your error reporting system. Giving up that certainty because your error reporting is not good enough seems like a bad tradeoff.

Incidentally, grug-friendly e2e tests absolutely exist: just take your software, exactly as it’s normally built, and run a script that uses it like it would be used in production. This gives you a good enough guarantee that it works. If there is no script, just do it yourself, go through a checklist, write a script later. It doesn’t get more grug than that.

replies(3): >>45072426 #>>45072459 #>>45072518 #

40. jessekv ◴[30 Aug 25 06:17 UTC] No.45072327{3}[source]▶

>>45072222 #

> Who tests the tests?

To me it's a bit like double entry bookkeeping. Two layers is valuable, but there's rapidly diminishing returns beyond two.

41. rustystump ◴[30 Aug 25 06:21 UTC] No.45072343[source]▶

>>45038074 (OP) #

At the end of the day, you need to have some kind of way to know if shit work or dont. This article feels a bit contrived to make an edgy point of “delete the test” which feels like it misses the real why behind testing.

42. mrugge ◴[30 Aug 25 06:23 UTC] No.45072352{6}[source]▶

>>45072276 #

In test-driven development, fast unit tests are a must-have. Integration tests are too slow. If you are not doing test-driven development, can go heavier into integration tests. I find the developer experience is not as fun without good unit tests, and even if velocity metrics are the same, that factor alone is a good reason to focus on writing more fast unit tests.

replies(1): >>45072852 #

43. mirekrusin ◴[30 Aug 25 06:25 UTC] No.45072359[source]▶

>>45038074 (OP) #

We add .skip and QAs are taking over in the background to address those issues.

44. runstop ◴[30 Aug 25 06:26 UTC] No.45072363[source]▶

>>45071799 #

Sounds a bit like "design by contract", leaving the assertions enabled in production code. It would be great to have solid DbC support in mainstream languages.

45. RHSeeger ◴[30 Aug 25 06:31 UTC] No.45072380{6}[source]▶

>>45072276 #

I've described this before on occasion; I consider there to be a wide variety of tests.

- Unit test = my code works

- Functional test = my design works

- Integration test = my code is using your 3rd party stuff correctly (databases, etc)

- Factory Acceptance Test = my system works

- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!

Then there's more "concern oriented" groupings, like "regression tests", which could fall into any number of the above.

That being said, there's a pretty wide set of opinions on the topic, and that doesn't really seem to change over time.

> these kinds of tests are rarely ever worth writing

I strongly disagree. I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases). Do they get discarded if you completely change the implementation? Sure. But that doesn't detract from the fact that they help make sure the current implementation works the way I say it does.

replies(1): >>45072493 #

46. sitkack ◴[30 Aug 25 06:35 UTC] No.45072389[source]▶

>>45038074 (OP) #

What a poorly written article. I should delete my tests because they fail randomly. My tests don't fail randomly.

47. cjfd ◴[30 Aug 25 06:38 UTC] No.45072399{3}[source]▶

>>45071841 #

I think the word you were looking for was 'cow orker'.

48. vitonsky ◴[30 Aug 25 06:38 UTC] No.45072400[source]▶

>>45038074 (OP) #

No, thanks. I already spent time to write tests while implementing a features, now I have a lot of tests that proof the feature is works fine, and I no more fear to make changes, because tests keep me safe of regression bugs.

The typical problems of any code base with no tests is a regression bugs, rigid team (because they must keep in mind all cases when code may destroy everything), fear driven development (because even team with zero rotation factor don't actually remember all problems they've fixed).

replies(1): >>45072478 #

49. soanvig ◴[30 Aug 25 06:38 UTC] No.45072401{4}[source]▶

>>45072079 #

I think this discussion has to be open with what is a "unit" in unit tests. "Integration" consists of many units working together. But my unit can be a function or entire module. That's what people ignore in most discussions about test types.

50. imiric ◴[30 Aug 25 06:39 UTC] No.45072404{4}[source]▶

>>45072176 #

You claim it's dogmatism, yet do the same thing in reverse. (:

Unit and integration tests test different layers of the system, and one isn't inherently better or more useful than the other. They complement each other to cover behavior that is impossible to test otherwise. You can't test low-level functionality in integration tests, just as you can't test high-level functionality in unit tests.

There's nothing dogmatic about that statement. If you disagree with it, that's your prerogative, but it's also my opinion that it is a mistake. It is a harmful mentality that makes code bases risky to change, and regressions more likely. So feel free to adopt it in your personal projects if you wish, but don't be surprised if you get push back on it when working in a team. Unless your teammates think the same, in which case, good luck to you all.

replies(1): >>45072649 #

51. lenkite ◴[30 Aug 25 06:40 UTC] No.45072408{3}[source]▶

>>45071535 #

These are "system tests" at your $DAY_JOB, not "integration tests".

replies(1): >>45076089 #

52. CuriouslyC ◴[30 Aug 25 06:42 UTC] No.45072421{4}[source]▶

>>45072079 #

Unit tests are good for testing isolated units of code, integration tests test integration. If you wait until you have enough code to test integration, when you actually write the tests you're going to find you've checked in a bunch of almost-working code.

53. rotbart ◴[30 Aug 25 06:43 UTC] No.45072422[source]▶

>>45038074 (OP) #

So... clickbait title for an article that could have been called "Delete flakey tests"...but then and most of us would have just gone "yep" and not clicked.

54. 4ndrewl ◴[30 Aug 25 06:43 UTC] No.45072425[source]▶

>>45038074 (OP) #

> If your test is creating confidence in broken code with failing tests, it would be better for it to not exist.

The author never considers the other option of fixing the flaky tests. I find this odd.

55. dsego ◴[30 Aug 25 06:44 UTC] No.45072426{3}[source]▶

>>45072321 #

E2e tests are the hardest to maintain and take a lot of time for little benefit in my experience. I'm talking about simulating a browser to open pages and click on buttons. They are flaky and brittle, the UI is easily the component which gets updated the most often, it's also easy to manually test while developing during QA and UAT. It's hard to mock out things, so you either have to bootstrap or maintain a whole 2nd working system with all the bells and whistles, including authentication, users, real data in the database, 3rd party integrations etc. It's just too overwhelming for little benefit. It's also hard to cover all error cases to see if a thing works correctly or breaks subtly. Most commonly in e2e we test for the happy path just to see that the thing doesn't fall over.

replies(1): >>45072507 #

56. CuriouslyC ◴[30 Aug 25 06:45 UTC] No.45072431{4}[source]▶

>>45072176 #

If you don't write unit tests, how do you know something works? Just manual QA? How long does that take you relative to unit tests? How do you know if something broke due to an indirect change? Just more manual QA? Do you really think this is saving you time?

replies(3): >>45072610 #>>45072748 #>>45073161 #

57. lenkite ◴[30 Aug 25 06:45 UTC] No.45072435[source]▶

>>45071984 #

In my last org, we just separated "flaky" system tests into its own independent suite. They were still valuable to run - just not all the time.

58. imiric ◴[30 Aug 25 06:48 UTC] No.45072446{5}[source]▶

>>45072205 #

You can't build a house without nails and screws, though.

Sure, if you're only writing a small script, you might not need tests at all. But as soon as that program evolves into a system that interacts with other systems, you need to test each component in isolation, as well as how it interacts with other systems.

So this idea that unit tests are not useful is coming from a place of laziness. Some developers see it as a chore that slows them down, instead of seeing it as insurance that makes their life easier in the long run, while also ensuring the system works as intended at all layers.

59. jraph ◴[30 Aug 25 06:50 UTC] No.45072459{3}[source]▶

>>45072321 #

Integration test are a compromise. e2e tests may be quite expensive to run (for a web application, you might need to run your backend and a web browser, possibly in a docker container - and the whole thing will also run slower). Efficiency matters a lot.

You can have robust testing by combining the two. You can check that the whole thing runs end to end once, and then test all the little features / variations using integration tests.

That's what we do for XWiki.

https://dev.xwiki.org/xwiki/bin/view/Community/Testing/#HTes...

60. teiferer ◴[30 Aug 25 06:51 UTC] No.45072465[source]▶

>>45038074 (OP) #

All seems like "fix tests" is the better advice.

Flaky test? Fix it! Make it rock solid! Slow test? Fix it! Make it fast! That can be hard (if it was easy, people would have already done it), but it's vastly more useful than deleting.

Even the mentioned overtesting requires a fix by focusing the tests on separate things. You could call that "deleting" but that's oversimplifying what's going on. Same with changed requirements.

61. willio58 ◴[30 Aug 25 06:55 UTC] No.45072478[source]▶

>>45072400 #

Did you read the article?

What is your answer to the points the author makes around flaky tests/changing business requirements/too many tests confirming the same functionality and taking too long to run?

replies(3): >>45072603 #>>45072615 #>>45072625 #

62. 9rx ◴[30 Aug 25 06:57 UTC] No.45072493{7}[source]▶

>>45072380 #

> I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases).

Sorting mightn't be the greatest example as sorting could quite reasonably be the entire program (i.e. a library).

But if you needed some kind of custom sort function to serve features within a greater application, you are already going to know that your sort function works correctly by virtue of the greater application working correctly. Testing the sort function in isolation is ultimately pointless.

As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done. The edge cases you discovered need to be moved up in the abstraction to the greater program anyway.

replies(1): >>45074747 #

63. jessekv ◴[30 Aug 25 06:58 UTC] No.45072497[source]▶

>>45038074 (OP) #

I hate to admit it, but flaky tests almost always highlight weaknesses in my software architecture.

And fixing a flaky test usually involves making the actual code more robust.

64. strogonoff ◴[30 Aug 25 06:59 UTC] No.45072507{4}[source]▶

>>45072426 #

The benefit is certainty that the system you are building and delivering to people works. If that benefit is little, then I don’t quite understand the point of testing.

> it's also easy to manually test while developing during QA and UAT.

As I said in the original comment, e2e tests can definitely be manual. Invoke your CLI, curl your API, click around in GUI. That said, comprehensively testing it that way quickly becomes infeasible as your software grows.

replies(1): >>45073917 #

65. ivanb ◴[30 Aug 25 07:02 UTC] No.45072518{3}[source]▶

>>45072321 #

In practice e2e tests don't cover all code paths and raise a question: what is the point? There is a code path explosion when going from a unit to an endpoint. A more low-level test can cover all code paths of every small unit, whereas tests at service boundary do not in practice do that. Even if they did, there would be a lot of duplication because different service endpoints would reuse the same units. Thus, I find e2e tests very limited in usability. They can demonstrate that the whole stack works together on a happy path, but that's about it.

replies(2): >>45073227 #>>45082446 #

66. mattlondon ◴[30 Aug 25 07:04 UTC] No.45072528[source]▶

>>45038074 (OP) #

Don't delete flaky tests, fix them.

67. juped ◴[30 Aug 25 07:16 UTC] No.45072578[source]▶

>>45038074 (OP) #

I think all the listed reasons are good reasons to delete tests. I like to keep the test suite running in a single-digit number of seconds. (Sometimes a test you really need takes a while, and you can skip it by default and enable it on the CI test runner or whatever.)

Another one I really agree with is "What if your tests are written so that a one line code change means updating 150 tests?". If you update a test, basically, ever, it's probably a bad test and is better off not existing than being like that. It's meant to distinguish main code with errors from main code without errors; if it must be updated in tandem with main code, it's just a detector that anything changed, which is counterproductive. Of course you're changing things, that's why you fenced them with tests.

68. nine_k ◴[30 Aug 25 07:17 UTC] No.45072590[source]▶

>>45038074 (OP) #

Un-clickbaiting the title: "Delete useless tests".

I once faced a suite of half-broken tests; so many were broken that engineers stopped caring if their changes broke another test. I suggested to separate a subset of still-working, useful tests, keep them always green, and make them passing a required check in CI/CD. Ignore the rest of the tests for CI/CD purposes. Gradually fix some of the flaky or out-of-sync tests if they are still useful, and promote them to the evergreen subset. Delete tests that are found to be beyond repair (like the article suggests).

This worked.

69. CPLX ◴[30 Aug 25 07:20 UTC] No.45072603{3}[source]▶

>>45072478 #

Hacker News is a prominent message board where users create wide ranging conversations based on article titles.

70. tsimionescu ◴[30 Aug 25 07:22 UTC] No.45072610{5}[source]▶

>>45072431 #

You can write many other other kinds of automated tests. Unit tests are rarely worth it, since they only look at the code in isolation, and often miss the forest for the trees if they're the only kind of test you have. But then, if you have other higher level tests that test your components are working well together, they're already implicitly covering that each component individually works well too - so your unit tests for that component are just duplicating the work the integration tests are already doing.

replies(1): >>45073627 #

71. snovv_crash ◴[30 Aug 25 07:22 UTC] No.45072615{3}[source]▶

>>45072478 #

Flaky tests: tests should be deterministic. If your tests are flakey in a 100% controlled environment, probably your real system is unreliable too.

Changing business requirements: business logic should be tested separately. It is expected to change, so if all of your tests include it, then yes of course it will be hard to maintain.

Too many tests for the same thing: yeah then maybe delete some of the duplicates?

Taking too long: mock stuff out. Also, maybe reconsider some architectural decisions you made, if your tests take too long it's probably going to bother your customers with slow behaviour too.

72. egeozcan ◴[30 Aug 25 07:23 UTC] No.45072621[source]▶

>>45038074 (OP) #

At one of the companies I worked with when I was doing consulting, they could make the slow tests, which used to take around 3 hours, run much faster and in parallel by throwing engineering and hardware resources at the problem. First it was 30 minutes, then it was 10, then around 2-3 minutes.

I think it was one of the best investments that company made.

So my point is, don't delete slow tests, just make them fast.

73. ikari_pl ◴[30 Aug 25 07:24 UTC] No.45072625{3}[source]▶

>>45072478 #

I think the point of article is to delete the BAD tests.

Just like you need to delete the bad code, not all the code. ;)

74. tsimionescu ◴[30 Aug 25 07:28 UTC] No.45072649{5}[source]▶

>>45072404 #

The problem with this line of argument is that, in general, high level behavior (covered by integratuon tests) is dependent on low level behavior. So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too. So, integration tests also tell you if your component works at a low level, not just a high level.

The converse is not true, however. It's perfectly possible for individual components to "work" well, but to not do the right thing from a high level perspective. Say, one component provides a good fast quicksort function, but the other component requires a stable sort to work properly - each is OK in isolation, but you need an integration test to figure out the mistake.

Unit tests are typically good scaffolding. They allow you to test bits of your infrastructure as you're building it but before it's ready for integration into the larger project. But they give you realtively little assurance at the project level, and are not worth it unless you're pretty sure you're building the right thing in the first place.

replies(3): >>45073029 #>>45073085 #>>45073466 #

75. s_ting765 ◴[30 Aug 25 07:45 UTC] No.45072722{3}[source]▶

>>45071726 #

Integration tests make unit tests absolutely redundant.

replies(2): >>45073253 #>>45073256 #

76. ◴[30 Aug 25 07:49 UTC] No.45072748{5}[source]▶

>>45072431 #

77. globular-toast ◴[30 Aug 25 07:53 UTC] No.45072764{4}[source]▶

>>45072079 #

It depends what you are doing. Let's say your module implements a way to declare rules and then run some validation function to check objects against those rules. You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters. You have to unit test the implementation of the module to be at all confident that it's doing the right thing.

So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.

replies(2): >>45073117 #>>45073150 #

78. MrJohz ◴[30 Aug 25 08:09 UTC] No.45072852{7}[source]▶

>>45072352 #

In general, fast tests are a must-have, but I find that means figuring out how to write fast integration tests as well so that they can also be run as part of a TDD-like cycle. In my experience, integration tests can generally be written to be very quick, but maybe my definition of an integration test is different from yours?

For me, heavy tests implies end-to-end tests, because at that point you're interacting with the whole system including potentially a browser, and that's just going to be slow whichever way you look at it. But just accessing a database, or parsing and sending http requests doesn't have to be particularly slow, at least not compared to the speed at which I develop. I'd expect to be able to run hundreds of those sorts of tests in less than a second, which is fast enough for me.

replies(1): >>45074223 #

79. creesch ◴[30 Aug 25 08:11 UTC] No.45072858{3}[source]▶

>>45071751 #

> I find testing terminology very confusing and inconsistent.

That's because it is both confusing and inconsistent. In my experience, every company uses slightly different names for different types of tests. Unit tests are generally fairly well understood as testing the single unit (a method/function) but after that things get murky fast.

For example, integration tests as reflected by the confused conversation in this thread already has wildly different definitions depending on who you ask.

For example, someone might interpret them as "unit integration tests" where it reflects a test that tests a class, builder, etc. Basically something where a few units are combined. But, in some companies I have seen these being called "component tests".

Then there is the word "functional tests" which in some companies means the same as "manual tests done by QA" but for others simply means automated front-end tests. But in yet other companies those automated tests are called end 2 end tests.

What's interesting to me when viewing these online discussions is the complete lack of awareness people display about this.

You will see people very confidently say that "test X should by done in such and such way" in response to someone where it is very clear they are actually talking about different types of tests.

replies(1): >>45073582 #

80. JimDabell ◴[30 Aug 25 08:13 UTC] No.45072873{3}[source]▶

>>45071726 #

In my experience, a bug that causes a unit test to fail also causes an integration or E2E test to fail. Also, it’s relatively easy to determine the cause of the problem given a change and a failing integration/E2E test. Unit tests are usually much quicker to run, but you also need a lot more of them. I think when you combine these things, it’s easy to reach the conclusion that unit tests are redundant.

81. huflungdung ◴[30 Aug 25 08:29 UTC] No.45072961[source]▶

>>45038074 (OP) #

Contrarian blog post trying to challenge the status quo without understanding the implications in order to look like a visionary.

I say this in interviews just to look smart. And people think it’s revolutionary. Everyone loves an outspoken opposition -what do they know, where did they get this knowledge?!

Only a couple have ever pushed back and those that do are the companies that I want to work with.

A wild example, let’s delete a test that ensures a heart pump works at the correct duty cycle given its parameters. Now someone comes along and redefines milliseconds to microseconds for some unrelated component. The tests are all fine. Patient now has a 60000 bpm heart rate.

Stupid idea.

82. imiric ◴[30 Aug 25 08:46 UTC] No.45073029{6}[source]▶

>>45072649 #

> So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too.

No, that is not guaranteed.

Integration and E2E tests can only cover certain code paths, because they depend on the input and output from other systems (frontend, databases, etc.). This I/O might be crafted in ways that never trigger a failure scenario or expose a bug within the lower-level components. This doesn't mean that the issue doesn't exist—it just means that you're not seeing it.

Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them, which means they will not have full coverage of the underlying components. Another issue is that often these tests, particularly E2E and acceptance tests, are written only with a happy path in mind, and ignore the myriad of input that might trigger a failure in the real world.

Another problem with your argument is that it ignores that tests have different audiences. E2E and acceptance tests are written for the end user; integration tests are written for system integrators and operators; and unit tests are written for users of the API, which includes the author and other programmers. If you disregard one set of tests, you are disregarding that audience.

To a programmer and maintainer of the software, E2E and acceptance tests have little value. They might not use the software at all. What they do care about is that the function, method, object, module, or package, does what says on the tin; that it returns the correct output when given a specific input; that it's performant, efficient, well documented, and so on. These users matter because they are the ones who will maintain the software in the long run.

So thinking that unit tests are useless because they're a chore to maintain is a very shortsighted mentality. Instead, it's more beneficial to see them as guardrails that make your future work easier, by giving you the confidence that you're not inadvertently breaking an API contract whenever you make a change, even when all higher-level tests remain green across the board.

replies(2): >>45073178 #>>45080787 #

83. codeulike ◴[30 Aug 25 08:52 UTC] No.45073069[source]▶

>>45038074 (OP) #

But before you delete the test, write a test that tests whether the test is deleted, and make sure that test is failing as expected. Then delete the test. Then run the other test that makes sure the test is deleted and it should now pass /s

84. integralid ◴[30 Aug 25 08:54 UTC] No.45073085{6}[source]▶

>>45072649 #

> So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too

In the ideal world maybe. But It's very hard to test edge cases of a sorting algorithm with integration test. In general my experience is that algorithms and some complex but pure functions are worth writing unit tests for. CRUD app boilerplate is not.

replies(1): >>45073543 #

85. MrJohz ◴[30 Aug 25 09:01 UTC] No.45073117{5}[source]▶

>>45072764 #

Yeah, that's similar to how I'd look for the correct place to put my tests. But at that point, a unit test is just the innermost layer of tests, which doesn't feel like a useful distinction. In your example, I might have a set of tests checking how the rules are parsed and interpreted (say), and then a set of tests one level up checking that the validation engine was a whole works, and then another set of tests a level up testing a module that uses the validation engine. The tests for the validation engine won't retest parsing, and the tests for the module using the validation engine won't test validation per se, but there's multiple layers there where each layer contains unit tests focusing on that layer's code specifically.

86. JackSlateur ◴[30 Aug 25 09:02 UTC] No.45073123{4}[source]▶

>>45072079 #

My integration tests test things that must work

My unit tests test things that must not work

87. troupo ◴[30 Aug 25 09:05 UTC] No.45073135{3}[source]▶

>>45071726 #

In the absolute vast majority of cases unit tests are useless. Because your end result should be a working system, not isolated working units with everything useful mocked out of existence.

Integration tests will cover every use case unit tests are supposed to cover if you actually tests the system behavior.

replies(1): >>45073228 #

88. troupo ◴[30 Aug 25 09:09 UTC] No.45073150{5}[source]▶

>>45072764 #

> You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters.

If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?

> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?

replies(1): >>45073680 #

89. troupo ◴[30 Aug 25 09:11 UTC] No.45073161{5}[source]▶

>>45072431 #

> If you don't write unit tests, how do you know something works?

Integration tests. Unlike unit tests they actually test if something works.

replies(1): >>45073296 #

90. lexicality ◴[30 Aug 25 09:11 UTC] No.45073163{3}[source]▶

>>45072191 #

Sounds like it's not actually well written in that case. Either you're testing the wrong output if it's non-deterministic or you have a consistency bug that's corrupting data in production.

replies(1): >>45073266 #

91. troupo ◴[30 Aug 25 09:14 UTC] No.45073178{7}[source]▶

>>45073029 #

> This I/O might be crafted in ways that never trigger a failure scenario or expose a bug within the lower-level components.

You mean just like unit tests where every useful interaction between units is mocked out of existence?

> Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them

And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought. Every test harness and library is geared towards unit testing, and unit testing only.

replies(1): >>45073268 #

92. strogonoff ◴[30 Aug 25 09:24 UTC] No.45073227{4}[source]▶

>>45072518 #

You are testing that the software works. I think that is higher value than testing all possible code paths in isolation, and then still not having the guarantee that it all works.

93. EliRivers ◴[30 Aug 25 09:24 UTC] No.45073228{4}[source]▶

>>45073135 #

At my last employer, some of our customers used our software to run systems of which the hardware value alone exceeded the entire market cap of my employer. They ran systems with a physical footprint many many times the area of my employer's leased offices and workspaces. The physical hardware devices that our software ran alone exceeded the value of my employer's market cap; just buying one of each would have been an enormous expense, ignoring that many of them were no longer available new and some of them were decades old, sitting alongside hardware that was made in the last six months. All the physical hardware working in synchronicity, linking up with similar sites in two other locations around the world, handing off to each other to follow the sun.

It would be nice to fully test system behaviour, but to do so would have bankrupted the company long before even coming close.

replies(1): >>45073237 #

94. troupo ◴[30 Aug 25 09:26 UTC] No.45073237{5}[source]▶

>>45073228 #

How were you making sure that your system actually works?

So you have unit tests testing things in isolation? Did you not test how they work together? Did you never run your system to see it actually works and behaves as expected? You just YOLO'd it over to customers and prayed?

replies(2): >>45073283 #>>45073293 #

95. ffsm8 ◴[30 Aug 25 09:27 UTC] No.45073242{4}[source]▶

>>45072079 #

Let's say you have a function thats being called to compute a state using hundreds of attributes spread across tens of different objects, and various different levels of nesting.

Now, you could create hundreds of different integration tests for each branch of the computation..., most of which will assert the same final output state, but achieved through different transitions

Or you can make some integration tests which make sure the logic itself is being called, and then only unittest the specific criteria in isolation.

What you're talking about is likely founded in either frontend testing (component tests vs unittest) or backends which have generally pretty trivial logic complexity. In these cases, just doing an integration test gets it done for the most part, but as soon as you got multiple stakeholders giving you sperate requirements and the consumed inputs get bigger and multiply ... Testing via integration tests gets essentially impossible to do in practice

replies(1): >>45073502 #

96. integralid ◴[30 Aug 25 09:30 UTC] No.45073253{4}[source]▶

>>45072722 #

Integration tests are as old as unit tests, and both predate their names. When exactly were unit tests made redundant? I don't see the point of your quip without a trace of actual argument.

I feel like I don't write enough tests, and when I do they're usually integration tests, but some things - algorithms, complex but pure functions, data structures - absolutely deserve their unit tests that can't be reasonably replaced by integration/e2e tests.

replies(1): >>45073374 #

97. yakshaving_jgt ◴[30 Aug 25 09:31 UTC] No.45073256{4}[source]▶

>>45072722 #

No they don’t.

If you’re, for example, writing a web application, and you have an endpoint which parses some data from the request and then responds with the result of that computation, why the hell would you test the fine-grained behaviour of your parser by emulating HTTP requests against your server?

Testing the parsing function in isolation is orders of magnitude cheaper.

replies(1): >>45073388 #

98. dcminter ◴[30 Aug 25 09:32 UTC] No.45073266{4}[source]▶

>>45073163 #

Exactly; a very occasionally flakey test may be tolerable but is almost by definition not well written.

The commonest type I see is one where instead of waiting until expected behaviour is exhibited with a suitable timeout, the test sleeps for some shorter period and then checks to see if the behaviour was exhibited.

These tests not only flake occasionally when the CI server or dev laptop is under unusual load, but worse, accumulate until the test suite is so full of "short" sleeps that the full set of test takes half an hour to run.

Often the sleeps were seen as being acceptable because the plan was to run the tests in parallel, but then the increased load results in the tests becoming flakey.

Once you have dozens of these flaking tests for this or other reasons, it becomes a project in itself to refactor them back to something sane.

Flakey tests should always be fixed immediately unless you're in the middle of an incident or something.

99. imiric ◴[30 Aug 25 09:33 UTC] No.45073268{8}[source]▶

>>45073178 #

> You mean just like unit tests where every useful interaction between units is mocked out of existence?

Sure, that is a risk. But not all unit tests require mocking or stubbing. There may be plenty of pure functions that are worth testing.

Writing good tests requires care and effort, like any other code, regardless of the test type.

> And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought.

Huh? Who is saying this?

The argument is coming from the other side with the claim that unit tests don't matter. Everyone arguing against this is saying that, no, all tests matter. (Let's not devolve into politics... :))

The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility. Higher-level tests can cover much more code than lower-level ones. In projects that keep track of code coverage, it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable. It simply means that they have a larger reach by their nature.

These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on. It is often impractical to rely on them during development, since they slow down the write-test loop. Instead, running the full unit test suite and a select couple of integration and E2E tests can serve as a quick sanity check, while the entire test suite runs in CI.

Conversely, achieving >50% of line or statement coverage with unit tests alone also doesn't mean that the software works as it should when it interacts with other systems, or the end user.

So, again, all test types are important and useful in their own way, and help ensure that the software doesn't regress.

replies(1): >>45077919 #

100. pydry ◴[30 Aug 25 09:34 UTC] No.45073276{3}[source]▶

>>45072222 #

Tests get implicitly tested by being run against code. When they fail in spite of the presence of no bugs then congratulations youve found a bug in your test.

replies(1): >>45073732 #

101. EliRivers ◴[30 Aug 25 09:36 UTC] No.45073283{6}[source]▶

>>45073237 #

I think the record uptime for a customer before they shut down one process to upgrade the software (leaving a dozen other such processes running, taking over the work in turn as each was upgraded) was on the order of six years. This was a set of 24 hour broadcast channels.

"How were you making sure that your system actually works?"

Good design and good software engineering goes a long way.

When you know that you cannot test by simply doing everything the customer will do, you have to think about what tests you can do that will indicate how the system will operate under a load that's orders of magnitude greater than what you can do yourself, with hardware you've never even seen. You have to think about how to write software that is likely to be high quality even if you can't test it how you'd like to.

For example, one can design the architecture in such a way that adding more load, more devices, will only linearly increase the demands on resources, and then from testing infer what the loads will be on actual customer sites. Any non-linearity in those regard were identified, if not at the design stage, in the unit-testing thereof.

One can design the code in such a way that the internal mechanisms of how devices work are suitably abstracted away, leaving as best one can manage common interfaces, and then rather than have to test with the exact arrangements of hardware customers will have, test with devices that to the extent possible, simulate the interactions our software will see. In this regard, it turns out that many devices that purport to meet standard protocols actually meet "variations about the theme" of protocols. But this too can be mitigated and handled to a degree by careful design and thought in the software engineering. The learning from doing this with one set of devices and protocols spreads to significantly different devices and protocols; every subsequent fresh design for the next iteration or generation of hardware is better and more resilient. A software engineering organisation that learns and retains knowledge and experience can go long way.

One can recognise that running live on customers sites is itself an opportunity. Some customers would never say a thing, for years on end. Some would want to be involved and would regularly talk about things they'd seen, unexpected things that happened, loads and events and so on; one can ensure that all that information is gathered by the sales people, the support reps, anyone and everyone who talks to the customers, and passed back effectively to have the results of that testing applied. For doomsday scenarios, such as crashes, resource exhaustion, pathological behaviours and so on; good logging and live measurements and dump catching etc can at least feed back so that this situation (which we would never be able to truly test ourselves) is not wasted, and gets fixed, and the lessons of it applied forwards into design and development. Harsh for the customer who finds an issue, but great for the hundred customers who will never hit it because it was tested by that unlucky customer. We'd be fools not to gather as much information as we could from poor customer experiences.

One can get hold of cheap, twenty year old devices that in theory match the same protocol, and go to town on them (some customers will actually be using that exact device and contemporaries - some customers will have brand new hardware that costs a tenth of my employer's market cap). From that, get an idea of how the software performs. Get another cheap device from ebay that is a decade old, and test it; see where it fails, but don't just fix those failures. From them, and similar repeats of the process, learn at a more fundamental level how devices differ and develop more general solutions that will either then be resilient to some new piece of hardware that hasn't even been made yet, or at least will not go wrong in such a way that the whole system is taken out and the poorly-supported brand new hardware is clearly seen by the software and reported on.

There's more. There's so much more, but once you have no choice but to come up with cheap, fast testing that nonetheless give a good indication of how the system will work when someone spends tens of millions on the hardware, software engineers can really come up with some smart, reliable ideas. It can also be really fun and satisfying to work on this.

"You just YOLO'd it over to customers and prayed?"

Absolutely not. It was all tested, repeatedly, over and over, and over the course of about fifteen year became remarkably resilient, adaptable, resource light, and so on. All the good things one would hope for. In a pinch, a small system could be run from someone's laptop; at the top end, banks and banks of servers with their fans banshee wailing 24 hours a day, with dozens of the principal processes (i.e. the main executable that runs) all running, all talking to each other across countries and time zones, handling their own redundancy against individual processes turning off. Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.

replies(1): >>45077939 #

102. yakshaving_jgt ◴[30 Aug 25 09:38 UTC] No.45073293{6}[source]▶

>>45073237 #

It’s a terrible tragedy isn’t it, that we can only choose one or the other.

replies(1): >>45078047 #

103. yakshaving_jgt ◴[30 Aug 25 09:39 UTC] No.45073296{6}[source]▶

>>45073161 #

This is utter nonsense.

replies(1): >>45077981 #

104. s_ting765 ◴[30 Aug 25 09:59 UTC] No.45073374{5}[source]▶

>>45073253 #

Here's the argument backing up my claim.

Unit tests don't matter when you have other types of testing like functional or integration testing that will tell you whether your code has the intended behavior and effect when run.

In the above statement unit tests is also considered as code.

That's where the redundancy comes from.

replies(1): >>45073615 #

105. s_ting765 ◴[30 Aug 25 10:02 UTC] No.45073388{5}[source]▶

>>45073256 #

Cheaper to run tests does not mean better. And your example does not test for real world behavior therefore the test is lower quality by definition.

replies(1): >>45078156 #

106. dgunay ◴[30 Aug 25 10:07 UTC] No.45073405[source]▶

>>45038074 (OP) #

I see a lot of debate about the definition and merits of unit, integration, and e2e testing here.

For my workplace, we have recognized a few issues with an overreliance on unit tests.

When you have important behaviors and invariants enforced by your database, you mock it out at your own peril. This literally caused a bug to slip through to prod this week. Unit tests just don't help here.

We use clean architecture. There are pockets of our codebase where, through deliberate or accidental deviations from our architecture, stuff like e.g. controllers with business logic in them exists. In some cases it is easier to just integration or e2e test this code instead of do the ugly refactoring to bring it into compliance. Doing the test first will even make the refactor easier.

Parts of our codebase are just big, pure functions. Arguments go in, return values come out. These are the ideal candidates for unit testing, and we do so extensively because they're cheap and fast.

I think what occurs to me as I write this is that if you live in an idyllic codebase which can express every single state transition in memory, without any dependency on external systems, sure, unit tests are great. For those of us who are not so lucky, e2e tests can be a lifeline and a way to maintain control with minimal mock-induced churn in the test suite.

107. EliRivers ◴[30 Aug 25 10:21 UTC] No.45073466{6}[source]▶

>>45072649 #

So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too.

I have 100% seen bugs that cancel each other out; code that's just plain wrong at the lower level, coming together by chance to work at the higher level such that one or more integration tests pass. When one piece of that lower level code then gets fixed, either deliberately or because of a library update or hardware improvement or some other change that should have nothing to do with the functionality, and the top level integration tests starts failing, it can be so painful to figure it out.

I've also seen bugs that cancel either other out to make one integration test pass, but don't cancel each other out such that other integration tests fail. That can be a mindmelt; surely if THIS test works, then ALL THIS low level code must be correct, but simultaneously if THAT test fails, then ALL THIS low level code is NOT correct. At which point, people start wishing they had lower level tests.

108. chamomeal ◴[30 Aug 25 10:27 UTC] No.45073502{5}[source]▶

>>45073242 #

I feel like that’s where property based testing comes in. Quickcheck style libraries.

I only recently started looking into Quickcheck style libraries in the typescript world, and fast-check is fantastic. Like super high quality. Great support for shrinking in all sorts of cases, very well typed, etc.

Hooking fast-check up to a real database/redis instance has been incredible for finding bugs. Pair it up with some regular ol case by case integration tests for some seriously robust typescript!

109. MoreQARespect ◴[30 Aug 25 10:36 UTC] No.45073543{7}[source]▶

>>45073085 #

Ive never in my life written a test for a sorting algorithm nor, im sure, will i ever need to.

The bias most developers have towards integration tests reflects the fact that even though we're often interviewed on it, it's quite rare that most developers actually have to write complex algorithms.

It's one of the ironies of the profession.

replies(1): >>45078221 #

110. MoreQARespect ◴[30 Aug 25 10:44 UTC] No.45073582{4}[source]▶

>>45072858 #

Unit tests dont have a coherent agreed upon definition either.

In fact, when I first saw Kent Beck's definition I did a double take because it covered what I would have called hermetic end to end tests.

The industry badly needs new words because it's barely possible to have a coherent conversation within the confines of the current terminology.

111. MoreQARespect ◴[30 Aug 25 10:52 UTC] No.45073609{3}[source]▶

>>45071535 #

The way some programmers treat test flakiness is weird.

With other types of bug programmers want to fix it. With flakiness they either want to rerun the test until it passes or tear it down and write an entirely different type of test - as if it is in fact not a bug, but some immutable fact of life.

112. skydhash ◴[30 Aug 25 10:53 UTC] No.45073615{6}[source]▶

>>45073374 #

Unit tests do matter, especially when the logic is somewhat complex or very defined (splitting money, parsing some message). So unless the specs change, you rarely have to modify the tests. So it helps more in a technical sense, catching developer mistakes. Just like qa tests on some small part of the car can spot defect early on.

Integrated tests is more about ensuring what matters to Product. A car that refuses to start is worthless for most cases. But the engine light and a window that can’t open is not usually a dealbreaker.

Unit tests can help pinpoint an issue or ensure that a specs is implemented. But that’s mostly relevant to the developer world. So for a proper DX, add unit tests to help pinpoint bugs faster, especially with code that doesn’t change as much and where knowledge can be lost.

113. skydhash ◴[30 Aug 25 10:58 UTC] No.45073627{6}[source]▶

>>45072610 #

Sometimes you really need to ensure that something is a tree. And you do not need the whole forest around for that. Sure you can’t have an adventure with only a tree. But if you need a tree, you need to make sure someone don’t bring a concrete tree sculpture.

114. MoreQARespect ◴[30 Aug 25 11:02 UTC] No.45073642{3}[source]▶

>>45072007 #

I tend to find that those bugs are in the extreme minority.

Most flakiness ends up being a bug in the test or nondeterminism exhibited by the code which users dont actually care about.

115. globular-toast ◴[30 Aug 25 11:11 UTC] No.45073680{6}[source]▶

>>45073150 #

You can't check because the numbers quickly become astronomical. Can you test the Python parser on all possible Python programs? Even if you limited the length of a program you're still talking about an absurdly large number of possible inputs.

What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"

replies(1): >>45077966 #

116. strogonoff ◴[30 Aug 25 11:22 UTC] No.45073732{4}[source]▶

>>45073276 #

What if a test passes as a result of the bug?

replies(1): >>45075701 #

117. dsego ◴[30 Aug 25 12:05 UTC] No.45073917{5}[source]▶

>>45072507 #

> The benefit is certainty that the system you are building and delivering to people works.

I'd say that works and works correctly and covers all edge cases are different scenarios in my mind. Looking at an exaggerated example, if I build tax calculator or something that crunches numbers, I'd have more confidence with a few unit tests matching the output of the main method that does the calculation part than a whole end-to-end test suite. It seems wasteful to run end to end (login, click buttons, check that a UI element appears, etc) to cover the logical output of one part that does the serious business logic. A simple e2e suite could be useful to check for regressions, as a smoke test, but it still needs to be kept less specific, otherwise it will break on minor UX changes, which makes it a pain to maintain.

replies(1): >>45074022 #

118. strogonoff ◴[30 Aug 25 12:25 UTC] No.45074022{6}[source]▶

>>45073917 #

An e2e test shows that it works. If your tax calculator’s business logic perfectly calculates the tax, but the app fails with a blank screen and a TypeError in console because a function from some UI widget lib dependency changed its signature, your calculator is as good as useless for all intents and purposes. A good unit test will not catch this, because you are not testing third-party code. An integration test that catches it approaches the complexity of an e2e.

Sure, you wouldn’t have all possible datasets and scenarios, but you can easily have a few, so that e2e test fails if results don’t make sense.

Of course, unit tests for your business logic make sense in this case. Ideally, you would express tax calculation rules as a declarative dataset and take care of that one function that applies these rules to data; if the rules are wrong, that is now a concern for the legal subject matter experts, not a bug in the app that you would need to bother writing unit tests for.

However, your unit test passing is just not a signal you can use for “ship it”. It is a development aid (hence the requirement for them to be fast). Meanwhile, an e2e test is that signal. It is not meant to be fast, but then when it comes to a release things can wait a few minutes.

replies(1): >>45074837 #

119. mrugge ◴[30 Aug 25 12:53 UTC] No.45074223{8}[source]▶

>>45072852 #

I inherited a django project which has mostly 'unit' tests that flex the ORM and the db, so they are really integration tests and are painfully slow. There is some important logic that happens in the ORM layer and that needs to be tested. At some point I want to find the time to mock the database so that they can be faster, but in some cases I worry about missing important interactions. Domain is highly specialized so not very easy to just know how to untangle the mess.

replies(1): >>45075282 #

120. barrkel ◴[30 Aug 25 13:14 UTC] No.45074365[source]▶

>>45071410 #

Yes. When testing a component, use its public API, inject real implementations where you can, and use fakes where it's too expensive. Don't use mocks, don't test interfaces that have very complex setups to invoke, if possible.

121. barrkel ◴[30 Aug 25 13:17 UTC] No.45074394{3}[source]▶

>>45071726 #

I don't agree. When you implement some new code and want to cover it in testing, you have a choice where you call into the system from your tests. Calling at the lowest level is not always the right choice, but it's the way the testing pyramid will bias you.

Instead - instead, there really is an instead here - you can call at a higher level which is less brittle to refactoring, has less complex setup, doesn't involve mocks that may not behave like the real thing, but still runs quickly due to fakes stubbing out expensive dependencies.

122. barrkel ◴[30 Aug 25 13:26 UTC] No.45074474{5}[source]▶

>>45072229 #

What if, instead of a bank account, it's FooSystemFrobnicationPreparer? Something which is necessary today, but probably should be refactored within the next year or two?

Maybe FooSystem will be redesigned to take different inputs,maybe the upstream will change to provide different outputs, maybe responsibility will shift around due to changes in the number of dependencies and it makes sense to vertically integrate some prep to upstream to share it.

Unit tests in these circumstances - and they're the majority of unit tests, IME - can act as a drag on the quality of the system. It's better to test things like this at a component level instead of units.

replies(1): >>45077606 #

123. MrJohz ◴[30 Aug 25 13:42 UTC] No.45074637{6}[source]▶

>>45072276 #

I find if you figure out the right unit boundaries, and find a good way of testing the code, you can often keep the tests around long-term, and they'll be very stable. Even when you update the code you're testing, if the tests are well-written, updating the tests is often just a case of running a find-and-replace job.

That said, I think it takes a real knack to figure out the right sort of tests, and it sometimes takes me a couple of attempts to get it right. In that case, being willing to delete or completely rewrite tests that just aren't being useful is important!

124. MrJohz ◴[30 Aug 25 13:58 UTC] No.45074747{8}[source]▶

>>45072493 #

It's very often easier to trigger edge cases when just testing a smaller part of a system then when testing the whole system. Moreover, you'll probably write more useful tests if you write them knowing what's going on in the code. In these cases, colocating the tests with the thing they're meant to be testing is really useful.

I find the problem with trying to move the tests up a level of abstraction is that eventually the code you're writing is probably going to change, and the tests that were useful for development the first time round will probably continue to be useful the second time round as well. So keeping them in place, even if they're really implementation-specific, is useful for as long as that implementation exists. (Of course, if the implementation changes for one with different edge cases, then you should probably get rid of the tests that were only useful for the old implementation.)

Importantly, this only works if the boundaries of the unit are fairly well-defined. If you're implementing a whole new sort algorithm, that's probably the case. But if I was just writing a function that compares two operands, that could be passed to a built-in sort function, I might look to see if there's a better level of abstraction to test at, because I can imagine the use of that compare function being something that changes a lot during refactorings.

replies(1): >>45074922 #

125. _caw ◴[30 Aug 25 14:03 UTC] No.45074792[source]▶

>>45071974 #

Would love to hear more about using Claude to determine which E2E tests to run. What context are you giving it?

Is it like, "this looks like a billing feature, let me run any tests that seem relevant"?

replies(1): >>45079927 #

126. dsego ◴[30 Aug 25 14:10 UTC] No.45074837{7}[source]▶

>>45074022 #

What's more likely to fail or cause issues? Dependencies failing and parsing errors are usually handled by the build system (type checkers and linters). In the case where they are triggered in production, it can be easily caught by monitoring services like Sentry. Ideally any changes are manually tested before releasing, and a bug in one part of the app that's being worked on is not likely to affect a different section, e.g. not necessary to retest the password reset flow if you're working on the home dashboard. Having a suit of usually flaky end-to-end tests seems like the most sloppy and cumbersome way to ensure the application runs fine, especially for a small team.

replies(1): >>45080520 #

127. 9rx ◴[30 Aug 25 14:21 UTC] No.45074922{9}[source]▶

>>45074747 #

> eventually the code you're writing is probably going to change

Ideally your units/integrations will never change. If they do change, that means the users of your code will face breakage and that's not good citizenry. Life is messy and sometimes you have little choice, but such changes should be as rare as possible.

What is actually likely to change is the little helper functions you create to support the units, like said bespoke sort function. This is where testing can quickly make code fragile and is ultimately unnecessary. If the sort function is more useful than just a helper then you will move it out into its own library and, like before, the sort function will become the entire program and thus the full integration.

replies(1): >>45076663 #

128. 9rx ◴[30 Aug 25 15:05 UTC] No.45075282{9}[source]▶

>>45074223 #

> I worry about missing important interactions.

If you are concerned that the ORM won't behave as it claims to, you can write tests targeted at it directly. You can then run the same tests against your mock implementation to show that it conforms to the same contract.

But an ORM of any decent quality will already be well tested and shouldn't do unexpected things, so perhaps the worry is for not?

129. pydry ◴[30 Aug 25 15:58 UTC] No.45075701{5}[source]▶

>>45073732 #

What I did say: tests get tested implicitly by the code they test.

What I didnt say: this catches 100% of all bugs in your tests.

130. MathMonkeyMan ◴[30 Aug 25 16:55 UTC] No.45076089{4}[source]▶

>>45072408 #

There is an even higher level of testing that they call "system tests." Those are for things like tracking performance regressions. I think there's always a spectrum among the terms "system," "integration," and "unit."

131. MrJohz ◴[30 Aug 25 17:54 UTC] No.45076663{10}[source]▶

>>45074922 #

The interface ideally doesn't change, but the implementation probably will. And most of the units you're writing are probably internal-facing, which means that even if the interface does change, fixing that is just an internal refactoring change - with types and a good IDE, it's often just a couple of key presses away.

I think this is what you're saying about moving useful units out into their own library. I agree, and I think it sounds like we'd draw the testing boundaries in similar places, but I don't think it's necessary to move these sorts of units into separate libraries for them to be isolated modules that can be usefully tested.

The sort function is one of the edge cases where how I'd test it would probably depend a lot on the context, but in theory a generic sort function has a very standard interface that I wouldn't expect to change much, if at all. So I'd be quite happy treating it as a unit in its own right and writing a bunch of tests for it. But if it's something really implementation-specific that depends on the exact structure of the thing it's sorting, then it's probably better tested in context. But I'm quite willing to write tests for little helper functions that I'm sure will be quite stable.

replies(1): >>45076748 #

132. 9rx ◴[30 Aug 25 18:03 UTC] No.45076748{11}[source]▶

>>45076663 #

> The interface ideally doesn't change

The whole of the interface is the unit, as Beck originally defined it. As it is the integration point. Hence why there is no difference between them.

> And most of the units you're writing are probably internal-facing

No. As before, it is a mistake to test internal functions. They are just an implementation detail. I understand that some have taken unit test to mean this, but I posit that as it is foolish to do it, there is no need to talk about it, allowing unit test to refer to its original and much more sensible definition. It only serves to confuse people into writing useless, brittle tests.

> So I'd be quite happy treating it as a unit in its own right

Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.

replies(1): >>45077397 #

133. MrJohz ◴[30 Aug 25 19:38 UTC] No.45077397{12}[source]▶

>>45076748 #

> Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.

Sure, yeah, I think we're saying the same thing. A unit is a chunk of code that can act as its own program or library - it has an interface that will remain fairly fixed, and an implementation that could change over time. (Or, a unit is the interface that contains this chunk of code - I don't think the difference between these two definitions is so important here.) You could pull it out into its own library, or you can keep it as a module/file/class/function in a larger piece of software, but it is a self-contained unit.

I think the important thing that I was trying to get across earlier, though, is that this unit can contain other units. At the most maximal scale, the entire application is a single unit made up of multiple sub-units. This is why I think a definition of unit/integration test that is based on whether a unit integrates other units doesn't really make much sense, because it doesn't actually change how you test the code. You still want quick, isolated tests, you still want to test the interface and not the internals (although you should be guided by the internals), and you still want to avoid mocking. So distinguishing between unit tests and integration tests in this way isn't particularly useful.

replies(1): >>45077976 #

134. MrJohz ◴[30 Aug 25 20:08 UTC] No.45077606{6}[source]▶

>>45074474 #

I mean, you get to decide what the unit is. I think this is one of the biggest issues with Java and some similar languages, in that it puts so much emphasis on classes (each class gets its own file and is the unit of import) that people used to Java think of classes as _the_ unit boundary, as opposed to being one type of boundary that can sometimes be useful.

So `BankAccount` as a class is probably a useful unit boundary: once you've designed the class, you're probably not going to change the interface much, except for possibly adding new methods occasionally. You have a stable boundary there, where in theory you could completely rewrite the internals of the class but the external boundary will stay the same.

`FooSystemFrobnicatorPreparer` sounds much more like an internal detail of some other system, I agree, and its interface could easily be rewritten or the class removed entirely if we decide to prepare our frobnication in a different way. But in that case, maybe the `foo.system.frobnicator` package is the unit we want to test as a whole, rather than one specific internal class inside that package.

I think a lot of good test and system design is finding these natural fault lines where it's possible to create a relatively stable interface that can hide internal implementation details.

135. RaftPeople ◴[30 Aug 25 20:09 UTC] No.45077620{3}[source]▶

>>45072123 #

Agree.

I don't think people realize the stats based on studies:

Unit testing catches about 30% of bugs

Visual code inspection catches about 70%

End to end testing also catches about 70%

All are important, but emphasis should be on the more effective methods.

> We’ve had decades of thought leadership around testing

I really disagree that this is the case that the industry has had thought leadership. I think we've had people pushing automated unit testing very hard when end to end is more effective. I don't think the position was based on facts but more of a few people's opinion.

136. troupo ◴[30 Aug 25 20:54 UTC] No.45077919{9}[source]▶

>>45073268 #

> Sure, that is a risk. But not all unit tests require mocking or stubbing.

Not all integrations require mocking or stubbing either. Yet somehow your argument against integration tests is that they somehow won't trigger failure scenarios.

> The argument is coming from the other side with the claim that unit tests don't matter.

My argument is that the absolute vast majority of unit tests are redundant and not required.

> The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility.

You're sort of implying that all tests are of equal importance, but that is not the case. Unit tests are the worst of all tests, and provide very little value in comparison to most other tests, and especially in comparison to how many unit tests you have to write.

> it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable.

So, a single E2E tests a scenario that covers >50% of code. This is somehow "not valuable" despite the fact that you'd often need up to a magnitude more unit tests covering the same code paths for that same scenario (and without any guarantees that the units tested actually work correctly with each other).

What you've shown, instead, is that E2E tests are significantly more valuable than unit tests.

However, true, E2E tests are often difficult to set up and run. That's why there's a middle ground: integration tests. You mock/stub out any external calls (file systems, API calls, databases), but you test your entire system using only exposed APIs/interfaces/capabilities.

> These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on.

And the only reason for that is this: "people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought." It shouldn't be difficult to test your system/app using it the way your users will use, but it always is. It shouldn't be able to mock/stub external access, but it always is.

That's why instead of writing a single integration test that tests a scenario across multiple units at once (at the same time testing that all units actually work with each other), you end up writing dozens of useless unit tests that test every single unit in isolation, and you often don't even know if they are glued together correctly until you get a weird error at 3 AM.

137. troupo ◴[30 Aug 25 20:56 UTC] No.45077939{7}[source]▶

>>45073283 #

How do you "design a good system" without testing it? Oh wait:

> It was all tested, repeatedly, over and over, and over the course of about fifteen year

So, you do test how your system actually works, and not just isolated unit tests.

> Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.

Indeed. And that good engineering would include a simple wisdom "unit tests are useless without integration and E2E tests, otherwise you wouldn't be able to run your software anywhere because units just wouldn't fit together".

And once you have proper integration tests, 99%+ of unit tests become redundant.

replies(1): >>45085573 #

138. Izkata ◴[30 Aug 25 20:59 UTC] No.45077959{3}[source]▶

>>45072191 #

My favorite flakey test where nothing was actually wrong with the code or test: The system used some of the same settings between development and CI, including the memcached server. The test would fail if one of the devs happened to be using their development site within 15 minutes of the next CI run, because the code would retrieve a nonexistent object from the cache and fail with a really strange error.

139. troupo ◴[30 Aug 25 21:00 UTC] No.45077966{7}[source]▶

>>45073680 #

> You can't check because the numbers quickly become astronomical.

But you can with unit tests?

> Can you test the Python parser on all possible Python programs?

A parser is one of the few cases where unit tests work. Very few people write parsers.

See also my sibling reply here: https://news.ycombinator.com/item?id=45078047

> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.

But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.

replies(1): >>45080368 #

140. 9rx ◴[30 Aug 25 21:03 UTC] No.45077976{13}[source]▶

>>45077397 #

> and you still want to avoid mocking.

Assuming by mock you mean an alternate implementation (e.g. an in-memory database repository) that relieves dependence on a service that is outside of immediate control, nah. There is no reason to avoid that. That's just an implementation detail and, as before, your tests shouldn't be bothered by implementation details. And since you can run your 'mock' against the same test suite as the 'real thing', you know that it fulfills the same contract as the 'real thing'. Mocks in that sense are also useful outside of testing.

If you mean something more like what is more commonly known as a stub, still no. This is essential for injecting failure states. You don't want to have to actually crash your hard drive to test your code under a hard drive crash condition. Testing failure cases are the most important tests you will write, so you will definitely be using these in all but the simplest programs.

141. troupo ◴[30 Aug 25 21:03 UTC] No.45077981{7}[source]▶

>>45073296 #

Unit tests test units in isolation.

Integration tests test that your system works. Testing how a system works covers the absolute vast majority of functionality you'd test with unit tests because you will hit the same code paths, and test the same behaviours you'd do with unit tests, and not in isolation.

This is a joke, but it's not: https://i.sstatic.net/yHGn1.gif

replies(1): >>45078040 #

142. yakshaving_jgt ◴[30 Aug 25 21:10 UTC] No.45078040{8}[source]▶

>>45077981 #

I have been doing TDD for over a decade, and I don’t know why you’re trying to explain the basics to me.

Yes, you can exercise the same code paths with integrated tests as you might with unit tests. There are multiple approaches to driving integrated tests, from the relatively inexpensive approach of emulating a HTTP env, to something more expensive and brittle like Selenium. You could also just test everything with manual QA. Literally pay some humans to click through your application following a defined path and asserting outcomes. Every time you make a change.

Obviously all of these have different costs. And obviously, testing a pure function with unit tests (whether example based or property based) is going to be cheaper than testing the behaviour of that same function while incidentally testing how it integrates with its collaborators.

replies(1): >>45086045 #

143. troupo ◴[30 Aug 25 21:11 UTC] No.45078047{7}[source]▶

>>45073293 #

You don't have to.

Unit tests work well for well-defined, contained units and library-like code.

E.g. you have code that calculates royalties based on criteria. You can and should test code like that with unit tests (better still, with property-based testing if possible)

Such code is in a tiny minority.

What you really want to do, is test that your system behaves as advertised. E.g. that if your API is called with param=1 it returns { a: "hello" }, and when with param=-1, it returns HTTP 401 or something.

The best way to do that is, of course E2E tests, but those are often quite difficult to set up (you need databases, external services, file systems etc.)

So you go for the middle ground: integration tests. Mock/stub unavailable external services. Test your full code flow. With one test you're likely to hit code paths that would require multiple unit tests to test, for a single scenario. You'll quickly find that easily 99%+ of your unit tests are absolutely redundant.

---

Offtop/rant/sidetrack.

This is especially egregious in "by the book" Java code. You'd have your controller that hits a service that collects data from facades that each hit external services via some modules. Each of those are tested in unit tests mocking the living daylight out of everything.

So for param=1 you'd have a unit test for controller (service mocked), service (facades mocked), each of the facades (if there are more than one, external services modules mocked), each of the external service modules (actual external services mocked).

Replace that with a single integration test where just the external service is mocked, and boom, you've covered all of those tests, and can trivially expand it to test external service being unavailable, timing out, returning invalid data etc.

144. yakshaving_jgt ◴[30 Aug 25 21:27 UTC] No.45078156{6}[source]▶

>>45073388 #

All else being equal, cheaper is absolutely better.

How does my example not test real world behaviour? I mean, I didn’t even provide any code here so what exactly are you imagining?

replies(1): >>45080732 #

145. yakshaving_jgt ◴[30 Aug 25 21:40 UTC] No.45078221{8}[source]▶

>>45073543 #

I write parsers all the time.

Why wouldn’t you test parsers in isolation?

replies(1): >>45080767 #

146. jampa ◴[31 Aug 25 02:51 UTC] No.45079927{3}[source]▶

>>45074792 #

I feed the `git diff` of the branch (excluding large files like package-lock) and a list of the E2E files. Claude reads and compares the E2E tests against the modified content.

The good thing about Claude Code is that it uses tool calls to explore the files to check which E2E tests can validate the changes.

replies(1): >>45080937 #

147. BobbyTables2 ◴[31 Aug 25 03:10 UTC] No.45080036[source]▶

>>45038074 (OP) #

Easier than deleting tests is to delete the runners that execute them.

Problem solved once and for all!

148. riehwvfbk ◴[31 Aug 25 03:42 UTC] No.45080184[source]▶

>>45071410 #

Yes, do this. No unit tests, integration only. Your competitors will thank you for this blunder.

Why is it a blunder? Well, you just slowed down your edit-compile-run cycle by about 10x, and debugging when things go wrong (and it's when, not if) by 100-1000 times depending on the complexity of your environment.

Perhaps the answer is "AI will fix it", but we aren't there yet.

replies(1): >>45084208 #

149. MrJohz ◴[31 Aug 25 04:24 UTC] No.45080368{8}[source]▶

>>45077966 #

It is important to have integration tests, but my experience is very much the opposite of what you're describing. I almost never have bugs where the cause is the small amount of glue code tying things together, because that code is usually tiny and incredibly simple (typically just passing arguments in one format to another format, and potentially catching errors and converting them to a different format). A couple of tests and a bit of static typing is sufficient to cover all the different possibilities because there are so few possibilities.

The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved.

In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor.

150. queenkjuul ◴[31 Aug 25 04:48 UTC] No.45080460{3}[source]▶

>>45071726 #

I do agree, but for this to work everyone has to be on the same page about what is a "unit"

I think most people are largely on the same page, but idk. Anyway what i mean is that as long as your unit tests are very narrowly scoped and testing specific chunks of logic, they really shouldn't break very often. I feel like my current project has done a decent job of this. Our unit tests rarely break, but to be fair, our integration tests fail too often.

I'm a UI developer, I've seen a lot of "unit" tests in PRs that go too far beyond testing specific logic and end up being brittle and not very useful.

151. strogonoff ◴[31 Aug 25 05:07 UTC] No.45080520{8}[source]▶

>>45074837 #

That sounds suspiciously like “don’t need to test if I use static typing and monitoring”.

> Ideally any changes are manually tested before releasing, and a bug in one part of the app that's being worked on is not likely to affect a different section, e.g. not necessary to retest the password reset flow if you're working on the home dashboard

That is one can of worms. First, during normal development work it is very common to modify some part that affects multiple parts of the app. In fact, it is inhuman to know exactly what it affects in a big app (ergo, testing). Second, while manual testing is a kind of e2e testing, it is not feasible in a bigger application.

> usually flaky end-to-end tests

Then make them not flaky. It’s amazing what can happen if something stops being treated as an afterthought!

152. s_ting765 ◴[31 Aug 25 06:02 UTC] No.45080732{7}[source]▶

>>45078156 #

Your example scenario avoided doing an end to end test for the data processing involving the parser instead opting for a unit test just for the parser.

We have developer sandboxes for this purpose so you don't have to guess the structure of the data you will receive from the actual server.

Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.

replies(1): >>45080872 #

153. simianwords ◴[31 Aug 25 06:09 UTC] No.45080767{9}[source]▶

>>45078221 #

Sure but not everyone is working at this level. Dogmatically writing unit tests where they don’t bring much value is something that happens all the time and needs to stop.

No one actually evaluates whether unit tests are needed.

Unit tests at least in my experience, are needed sparingly - in specific places that encompass slightly complicated well contained logic.

replies(1): >>45080806 #

154. simianwords ◴[31 Aug 25 06:13 UTC] No.45080787{7}[source]▶

>>45073029 #

> So thinking that unit tests are useless because they're a chore to maintain is a very shortsighted mentality. Instead, it's more beneficial to see them as guardrails that make your future work easier, by giving you the confidence that you're not inadvertently breaking an API contract whenever you make a change, even when all higher-level tests remain green across the board.

This is the kind of dogmatism I want people to understand. I’m not saying unit tests are useless but they have very narrow use, in units that encompass slightly complicated logic. Most of us write classes that just have a few for loops, if conditions, metrics and a few transformations. The overhead of writing a unit tests where, mocking all external services and continuously maintaining them when every small code change causes unit tests to break (false positives) is pretty high.

replies(1): >>45081993 #

155. yakshaving_jgt ◴[31 Aug 25 06:16 UTC] No.45080806{10}[source]▶

>>45080767 #

I think parsing happens in more places than people might think.

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

156. yakshaving_jgt ◴[31 Aug 25 06:34 UTC] No.45080872{8}[source]▶

>>45080732 #

How do you know that? How do you know that I wouldn't write an integrated test which verifies that my parser is correctly integrated with my system?

The point is, there is typically more than one path through the logic of a parser. The cheapest way to test these paths is in isolation. If there are five paths you care about, you could write six integrated tests — one for each of the five paths you care about, and one to verify that your parser is correctly integrated with your system, or of course you could write five isolated tests (which are cheaper to write and cheaper to execute) and one integrated test.

So, five cheap tests and one more expensive test, or six more expensive tests.

> Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.

…What? I'm sorry, this is near enough unintelligible.

replies(1): >>45081165 #

157. anon7000 ◴[31 Aug 25 06:47 UTC] No.45080937{4}[source]▶

>>45079927 #

How well does this work for massive codebases and thousands of e2e tests?

158. s_ting765 ◴[31 Aug 25 07:30 UTC] No.45081165{9}[source]▶

>>45080872 #

> How do you know that? How do you know that I wouldn't write an integrated test which verifies that my parser is correctly integrated with my system?

You already said so in your first argument: Unit tests are cheaper and better(than integrated tests I presume).

Am simply following your behavior pattern here.

> The point is, there is typically more than one path through the logic of a parser. The cheapest way to test these paths is in isolation. If there are five paths you care about, you could write six integrated tests — one for each of the five paths you care about, and one to verify that your parser is correctly integrated with your system, or of course you could write five isolated tests (which are cheaper to write and cheaper to execute) and one integrated test.

This is nonsense. A standard parser takes one input and does processing of this data to give an expected output. An integration test checks the parser does this one objective correctly. You have boiled down the 5 unit tests that don't test for anything *real into 1 integrated test that objectively gives better test data.

*code is not real until it does some business logic!

> Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.

> …What? I'm sorry, this is near enough unintelligible.

Maybe try to froth less when reading my comments, your brain might have some capacity left to understand comparitive adjectives.

replies(1): >>45081378 #

159. yakshaving_jgt ◴[31 Aug 25 08:07 UTC] No.45081378{10}[source]▶

>>45081165 #

> This is nonsense. A standard parser takes one input and does processing of this data to give an expected output.

This appears to be the root of your confusion.

Here's a good example of a parser: https://entropicthoughts.com/parser-combinators-parsing-for-...

There are at least 16 paths through this function.

Not one.

16.

160. senfiaj ◴[31 Aug 25 09:02 UTC] No.45081645[source]▶

>>45038074 (OP) #

Tests are also code, thus they also add maintenance costs. IMHO, usually about 8-15% of code (depends on the project) may need automatic testing. If the code can be quickly tested manually, maybe you should think twice before writing tests. The code that changes too quickly (because of requirements changes as mentioned in the article) is also not the best candidate to be tested automatically since what's the point of regression testing if it's almost rewritten from scratch. A better candidate that is worth to be tested in automatic way is a code that is critical (medical equipment, bank systems, airplanes, etc.) and changes slowly/gradually enough (so regression testing will make sense here).

161. imiric ◴[31 Aug 25 10:12 UTC] No.45081993{8}[source]▶

>>45080787 #

> Most of us write classes that just have a few for loops, if conditions, metrics and a few transformations.

You're describing code. At what point does code become "worthy" of a unit test? How do you communicate this to your team members? This type of ambiguity introduces friction and endless discussions in code reviews, to the point that abiding to the convention that all code should be unit tested whenever possible is a saner long-term strategy. This doesn't have to be a strict rule, but it makes sense as a general convention. Besides, these days with LLMs, writing and maintaining unit tests doesn't have to be a chore anymore. It's one thing the tech is actually reasonably good at.

What I think we fundamentally disagree about is the value of unit tests. That small function with a few for loops and if conditions still has users, which at the end of the day might be only yourself. You can't be sure that it's working as intended without calling it. You can do this either manually; automatically by the adjacent code that calls it, whether that's within an integration/E2E test or in production; or with automated unit tests. Out of those options, automated unit tests are the ones that provide the highest degree of confidence, since you have direct control over its inputs and visibility of its outputs. Everything else has varying degrees of uncertainty, which carries a chance of exposing an issue to end users.

Now, you might be fine with that uncertainty, especially if you're working on a solo project. But this doesn't mean that there's no value in having extensive coverage from unit tests. It just means that you're willing to accept a certain level of uncertainty, willing to tradeoff confidence for convenience of not having to write and maintain code that you personally don't find valuable, and willing to accept the risk of exposing issues to end users.

162. stavros ◴[31 Aug 25 11:46 UTC] No.45082446{4}[source]▶

>>45072518 #

The problem tends to be that programs most often break at the integration boundaries, though.

163. recursivedoubts ◴[31 Aug 25 16:05 UTC] No.45084208{3}[source]▶

>>45080184 #

idk i used this approach and done ok competitively (e.g. htmx, me, one dude, vs react, facebook & vercel) and, as grugbrain.dev says, unit test tend to lock you in to a particular implementation

integration tests don't need to be slow on modern hardware, are easier to debug than end-to-end if they are kept at the right level of abstraction and catch more real-world bugs than overly-specialized unit tests w/ complicated mocks, etc

164. EliRivers ◴[31 Aug 25 18:23 UTC] No.45085573{8}[source]▶

>>45077939 #

We don't have those integration tests. They happened by chance when customers did things. Do we really have tests if we cannot define those tests, if we cannot run those tests? If we make a significant change that could feasibly affect the outcome of those tests, and we are unable to run those tests, do we have those tests?

"How do you "design a good system" without testing it?"

We must be coming at this through such widly different contexts. To me, it is simply obvious and normal that it's possible to create a good design for something, and that good design can exist before any tests have ever been created or executed.

To me, that you asks that question suggests that we have such different contexts that we might as well be speaking different languages. I would be horrified that people would churn out a rubbish design and just let tests handle all the crap and force it into a good design; but I do gather that's normal procedure in some industries.

165. troupo ◴[31 Aug 25 19:06 UTC] No.45086045{9}[source]▶

>>45078040 #

> You could also just test everything with manual QA. Literally pay some humans to click through your application following a defined path and asserting outcomes. Every time you make a change.

How to see if someone is arguing in bad faith? Well, they pretend that reductio ad absurdum is a valid argument

> Obviously all of these have different costs. And obviously, testing a pure function with unit tests (whether example based or property based) is going to be cheaper than testing the behaviour of that same function while incidentally testing how it integrates with its collaborators.

Let's see. A single scenario in an integration test:

- tests multiple code paths removing the need for multiple unit tests along that code path

- tests externally observable behaviour of the app/api/system is according to spec/docs

- tests that all units (that would otherwise be tested in isolation from each other) actually work together

This is obviously cheaper. Programmer (the expensive part) has to write less code, the system doesn't suddenly break because someone didn't wire units together (the insanely expensive part, 'cause everything was mocked in tests; unironically true story that hammered the final nail in the coffin of unit tests for me).

By the way, here's what Kent Beck has to say about unit tests: https://stackoverflow.com/a/153565

--- start quote ---

I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence

--- end quote ---

replies(1): >>45086253 #

166. yakshaving_jgt ◴[31 Aug 25 19:26 UTC] No.45086253{10}[source]▶

>>45086045 #

Feel free to try coming up with a single integrated test that tests all 16 paths through this parsing function.

https://news.ycombinator.com/item?id=45081378

> By the way, here's what Kent Beck has to say about unit tests

As I pointed out to you earlier, I've been doing TDD for a long time. I'm already plenty familiar with Kent Beck's writing.

---

I'm not convinced that you actually know what you're talking about. You've contradicted yourself a number of times when responding to me and to others. You construct straw men to argue against (who said everything needs to mocked in unit tests?). You've said "very few people write parsers", which is utter nonsense — parsing, whether you realise it or not, is one of the most common things you'll do as a working programmer. You've insisted that unit tests don't actually test that something works. You've created this false dichotomy where one has to choose between either isolated tests or integrated tests.

All I can say is good luck to you mate.

↑