Delete tests

(andre.arko.net)

Show context

recursivedoubts ◴[30 Aug 25 02:27 UTC] No.45071410[source]▶

One of the most important things you can do is move your tests up the abstraction layers and away from unit tests. For lack of a better term, to move to integration tests. End-to-end tests are often too far from the system to easily understand what's wrong when they break, and can overwhelm a development org. Integration tests (or whatever you want to call them) are often the sweet spot: not tied to a particular implementation, able to survive fairly significant system changes, but also easy enough to debug when they break.

https://grugbrain.dev/#grug-on-testing

replies(11): >>45071535 #>>45071726 #>>45071751 #>>45071944 #>>45072117 #>>45072123 #>>45072158 #>>45072321 #>>45072494 #>>45074365 #>>45080184 #

RHSeeger ◴[30 Aug 25 03:42 UTC] No.45071726[source]▶

>>45071410 #

Integration tests and Unit tests are different tools; and each has their place and purpose. Using one "instead" of the other is a mistake.

replies(8): >>45072079 #>>45072176 #>>45072722 #>>45072873 #>>45073135 #>>45074394 #>>45080460 #>>45093392 #

MrJohz ◴[30 Aug 25 05:20 UTC] No.45072079[source]▶

>>45071726 #

I've never really found this to be the case in practice. When I look at well-written unit tests and well-written integration tests, they're usually doing exactly the same sort of thing and have very similar concerns in terms of code organisation and test structure.

For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.

The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.

replies(7): >>45072229 #>>45072232 #>>45072401 #>>45072421 #>>45072764 #>>45073123 #>>45073242 #

1. globular-toast ◴[30 Aug 25 07:53 UTC] No.45072764{3}[source]▶

>>45072079 #

It depends what you are doing. Let's say your module implements a way to declare rules and then run some validation function to check objects against those rules. You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters. You have to unit test the implementation of the module to be at all confident that it's doing the right thing.

So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.

replies(2): >>45073117 #>>45073150 #

2. MrJohz ◴[30 Aug 25 09:01 UTC] No.45073117[source]▶

>>45072764 (TP) #

Yeah, that's similar to how I'd look for the correct place to put my tests. But at that point, a unit test is just the innermost layer of tests, which doesn't feel like a useful distinction. In your example, I might have a set of tests checking how the rules are parsed and interpreted (say), and then a set of tests one level up checking that the validation engine was a whole works, and then another set of tests a level up testing a module that uses the validation engine. The tests for the validation engine won't retest parsing, and the tests for the module using the validation engine won't test validation per se, but there's multiple layers there where each layer contains unit tests focusing on that layer's code specifically.

3. troupo ◴[30 Aug 25 09:09 UTC] No.45073150[source]▶

>>45072764 (TP) #

> You can't just test every possible set of rules and object that you want to check, even though this is, of course, all that matters.

If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?

> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.

Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?

replies(1): >>45073680 #

4. globular-toast ◴[30 Aug 25 11:11 UTC] No.45073680[source]▶

>>45073150 #

You can't check because the numbers quickly become astronomical. Can you test the Python parser on all possible Python programs? Even if you limited the length of a program you're still talking about an absurdly large number of possible inputs.

What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"

replies(1): >>45077966 #

5. troupo ◴[30 Aug 25 21:00 UTC] No.45077966{3}[source]▶

>>45073680 #

> You can't check because the numbers quickly become astronomical.

But you can with unit tests?

> Can you test the Python parser on all possible Python programs?

A parser is one of the few cases where unit tests work. Very few people write parsers.

See also my sibling reply here: https://news.ycombinator.com/item?id=45078047

> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.

Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.

But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.

replies(1): >>45080368 #

6. MrJohz ◴[31 Aug 25 04:24 UTC] No.45080368{4}[source]▶

>>45077966 #

It is important to have integration tests, but my experience is very much the opposite of what you're describing. I almost never have bugs where the cause is the small amount of glue code tying things together, because that code is usually tiny and incredibly simple (typically just passing arguments in one format to another format, and potentially catching errors and converting them to a different format). A couple of tests and a bit of static typing is sufficient to cover all the different possibilities because there are so few possibilities.

The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved.

In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor.

↑