For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.
The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.
So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.
If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?
> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?
What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"
But you can with unit tests?
> Can you test the Python parser on all possible Python programs?
A parser is one of the few cases where unit tests work. Very few people write parsers.
See also my sibling reply here: https://news.ycombinator.com/item?id=45078047
> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.
But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.
The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved.
In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor.