For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.
The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.
I would write integration/system (different, but similar, imo) to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.
I would write integration tests for smaller, targeted thing. Like making sure the sort method works in various cases, etc. Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.
I've found that well-written integration tests help me catch workflow-level issues (eg something changed in a dependency that might be mocked in unit tests).
So while I think good integration tests are the best way to make sure things should ship, I see a lot of value in good unit tests for day-to-day velocity, particularly in code that's being maintained or updated instead of new code.
This is what unit testing was originally described as. Which confirms my belief that unit testing and integration testing has always been the very same thing.
> Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.
Perhaps unit testing has come to mean this, but these kinds of tests are rarely ever worth writing, so it is questionable if it even needs a name. Sometimes it can be helpful to isolate a function like that for the sake of pinning down complex logic or edge cases, but is likely you'll want to delete this kind of test once you're done. This is where testing brittleness is born.
- Unit test = my code works
- Functional test = my design works
- Integration test = my code is using your 3rd party stuff correctly (databases, etc)
- Factory Acceptance Test = my system works
- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!
Then there's more "concern oriented" groupings, like "regression tests", which could fall into any number of the above.
That being said, there's a pretty wide set of opinions on the topic, and that doesn't really seem to change over time.
> these kinds of tests are rarely ever worth writing
I strongly disagree. I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases). Do they get discarded if you completely change the implementation? Sure. But that doesn't detract from the fact that they help make sure the current implementation works the way I say it does.
Sorting mightn't be the greatest example as sorting could quite reasonably be the entire program (i.e. a library).
But if you needed some kind of custom sort function to serve features within a greater application, you are already going to know that your sort function works correctly by virtue of the greater application working correctly. Testing the sort function in isolation is ultimately pointless.
As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done. The edge cases you discovered need to be moved up in the abstraction to the greater program anyway.
So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.
For me, heavy tests implies end-to-end tests, because at that point you're interacting with the whole system including potentially a browser, and that's just going to be slow whichever way you look at it. But just accessing a database, or parsing and sending http requests doesn't have to be particularly slow, at least not compared to the speed at which I develop. I'd expect to be able to run hundreds of those sorts of tests in less than a second, which is fast enough for me.
My unit tests test things that must not work
If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?
> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?
Now, you could create hundreds of different integration tests for each branch of the computation..., most of which will assert the same final output state, but achieved through different transitions
Or you can make some integration tests which make sure the logic itself is being called, and then only unittest the specific criteria in isolation.
What you're talking about is likely founded in either frontend testing (component tests vs unittest) or backends which have generally pretty trivial logic complexity. In these cases, just doing an integration test gets it done for the most part, but as soon as you got multiple stakeholders giving you sperate requirements and the consumed inputs get bigger and multiply ... Testing via integration tests gets essentially impossible to do in practice
I only recently started looking into Quickcheck style libraries in the typescript world, and fast-check is fantastic. Like super high quality. Great support for shrinking in all sorts of cases, very well typed, etc.
Hooking fast-check up to a real database/redis instance has been incredible for finding bugs. Pair it up with some regular ol case by case integration tests for some seriously robust typescript!
What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"
Maybe FooSystem will be redesigned to take different inputs,maybe the upstream will change to provide different outputs, maybe responsibility will shift around due to changes in the number of dependencies and it makes sense to vertically integrate some prep to upstream to share it.
Unit tests in these circumstances - and they're the majority of unit tests, IME - can act as a drag on the quality of the system. It's better to test things like this at a component level instead of units.
That said, I think it takes a real knack to figure out the right sort of tests, and it sometimes takes me a couple of attempts to get it right. In that case, being willing to delete or completely rewrite tests that just aren't being useful is important!
I find the problem with trying to move the tests up a level of abstraction is that eventually the code you're writing is probably going to change, and the tests that were useful for development the first time round will probably continue to be useful the second time round as well. So keeping them in place, even if they're really implementation-specific, is useful for as long as that implementation exists. (Of course, if the implementation changes for one with different edge cases, then you should probably get rid of the tests that were only useful for the old implementation.)
Importantly, this only works if the boundaries of the unit are fairly well-defined. If you're implementing a whole new sort algorithm, that's probably the case. But if I was just writing a function that compares two operands, that could be passed to a built-in sort function, I might look to see if there's a better level of abstraction to test at, because I can imagine the use of that compare function being something that changes a lot during refactorings.
Ideally your units/integrations will never change. If they do change, that means the users of your code will face breakage and that's not good citizenry. Life is messy and sometimes you have little choice, but such changes should be as rare as possible.
What is actually likely to change is the little helper functions you create to support the units, like said bespoke sort function. This is where testing can quickly make code fragile and is ultimately unnecessary. If the sort function is more useful than just a helper then you will move it out into its own library and, like before, the sort function will become the entire program and thus the full integration.
If you are concerned that the ORM won't behave as it claims to, you can write tests targeted at it directly. You can then run the same tests against your mock implementation to show that it conforms to the same contract.
But an ORM of any decent quality will already be well tested and shouldn't do unexpected things, so perhaps the worry is for not?
I think this is what you're saying about moving useful units out into their own library. I agree, and I think it sounds like we'd draw the testing boundaries in similar places, but I don't think it's necessary to move these sorts of units into separate libraries for them to be isolated modules that can be usefully tested.
The sort function is one of the edge cases where how I'd test it would probably depend a lot on the context, but in theory a generic sort function has a very standard interface that I wouldn't expect to change much, if at all. So I'd be quite happy treating it as a unit in its own right and writing a bunch of tests for it. But if it's something really implementation-specific that depends on the exact structure of the thing it's sorting, then it's probably better tested in context. But I'm quite willing to write tests for little helper functions that I'm sure will be quite stable.
The whole of the interface is the unit, as Beck originally defined it. As it is the integration point. Hence why there is no difference between them.
> And most of the units you're writing are probably internal-facing
No. As before, it is a mistake to test internal functions. They are just an implementation detail. I understand that some have taken unit test to mean this, but I posit that as it is foolish to do it, there is no need to talk about it, allowing unit test to refer to its original and much more sensible definition. It only serves to confuse people into writing useless, brittle tests.
> So I'd be quite happy treating it as a unit in its own right
Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.
Sure, yeah, I think we're saying the same thing. A unit is a chunk of code that can act as its own program or library - it has an interface that will remain fairly fixed, and an implementation that could change over time. (Or, a unit is the interface that contains this chunk of code - I don't think the difference between these two definitions is so important here.) You could pull it out into its own library, or you can keep it as a module/file/class/function in a larger piece of software, but it is a self-contained unit.
I think the important thing that I was trying to get across earlier, though, is that this unit can contain other units. At the most maximal scale, the entire application is a single unit made up of multiple sub-units. This is why I think a definition of unit/integration test that is based on whether a unit integrates other units doesn't really make much sense, because it doesn't actually change how you test the code. You still want quick, isolated tests, you still want to test the interface and not the internals (although you should be guided by the internals), and you still want to avoid mocking. So distinguishing between unit tests and integration tests in this way isn't particularly useful.
So `BankAccount` as a class is probably a useful unit boundary: once you've designed the class, you're probably not going to change the interface much, except for possibly adding new methods occasionally. You have a stable boundary there, where in theory you could completely rewrite the internals of the class but the external boundary will stay the same.
`FooSystemFrobnicatorPreparer` sounds much more like an internal detail of some other system, I agree, and its interface could easily be rewritten or the class removed entirely if we decide to prepare our frobnication in a different way. But in that case, maybe the `foo.system.frobnicator` package is the unit we want to test as a whole, rather than one specific internal class inside that package.
I think a lot of good test and system design is finding these natural fault lines where it's possible to create a relatively stable interface that can hide internal implementation details.
But you can with unit tests?
> Can you test the Python parser on all possible Python programs?
A parser is one of the few cases where unit tests work. Very few people write parsers.
See also my sibling reply here: https://news.ycombinator.com/item?id=45078047
> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.
But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.
Assuming by mock you mean an alternate implementation (e.g. an in-memory database repository) that relieves dependence on a service that is outside of immediate control, nah. There is no reason to avoid that. That's just an implementation detail and, as before, your tests shouldn't be bothered by implementation details. And since you can run your 'mock' against the same test suite as the 'real thing', you know that it fulfills the same contract as the 'real thing'. Mocks in that sense are also useful outside of testing.
If you mean something more like what is more commonly known as a stub, still no. This is essential for injecting failure states. You don't want to have to actually crash your hard drive to test your code under a hard drive crash condition. Testing failure cases are the most important tests you will write, so you will definitely be using these in all but the simplest programs.
The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved.
In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor.
It is entirely possible for a sort function to be just one component of the functionality of the larger code base. Sort in specific is something I've written unit tests for.
> As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done.
Those edge cases (and normal cases) continue to exist after the code is written. And if you find a new edge case later and need to change the code, then having the previous unit tests in place gives a certain amount of confidence that your changes (for the new case) aren't breaking anything. Generally, the only time I _remove_ unit tests is if I'm changing to a new implementation; when the method being tested no longer exists.