Integration tests are closer to what you want to know, but they're also more. If I want to make sure that my state machine returns an error when it receives a message for which no state transition is defined, I could spin up a process and set up log collection and orchestrate with python and... or I could write a unit test that instantiates a state machine, gives it a message, and checks the result.
My point is that we need both. Write a unit test to ensure that your component behaves to its spec, especially with respect to edge cases. Write an integration test to make sure that the feature of which your component is a part behaves as expected.
Unit tests are great for DX but only integration and above tests matter business wise.
Passing params in instead of making external calls inside your business logic functions can help. DI can help if that's too impractical or unwieldy for whatever reason in the domain.
It's hard to do right the first time - sometimes its fuzzy what's an internal detail vs what's an external contract - but you need to get there ASAP.
https://ashishb.net/programming/bad-and-good-ways-to-write-a...
I can write a module with integration tests at the module level and unit tests on its functions.
I can now write an application that uses my module. From the perspective of my application, my module's integration tests look like unit tests.
My module might, for example, implicitly depend on the test suite of CPython, the C compiler, the QA at the chip fab. But I don't need to run those tests any more.
In your case you hope the in-memory database matches the production one enough that you can write fast isolated unit tests on your application logic. You can trust this works because something else unit-tested the in-memory database, and integration tested the db client against the various db backends.
For example, in both cases, the tests work best if I test the subject under test as a black box (i.e. interact only with its public interface) but use my knowledge of its internals to identify the weaknesses that will most require testing. In both cases, I want to structure the code so that the subject under test is as isolated as possible - i.e. no complex interactions with global state, no mocking of unrelated modules, and no complex mechanism to reset anything after the test is done. In both cases, I want the test to run fast, ideally instantaneously, so I get immediate results.
The biggest difference is that it's usually harder to write good integration tests because they're interacting with external systems that are generally slower and stateful, so I've got to put extra work into getting the tests themselves to be fast and stateless. But when that works, there's really not much difference at all between a test that tests a single function, and a test that tests a service class with a database dependency.
Found this long 2011 post now that goes into some detail on the background and the reasons for introducing that ("The Testing Grouplet"?): https://mike-bland.com/2011/11/01/small-medium-large.html
But I am not sure even after reading all that if the SML terminology was still used in 2011 or if they had moved on already? Can't really find any newer sources that mention it.
Sometimes you just don’t need unit tests and it’s okay to admit it and work accordingly.
I would write integration/system (different, but similar, imo) to test that the black box integrations with the system work as expected. Generally closer to the "user story" end of things.
I would write integration tests for smaller, targeted thing. Like making sure the sort method works in various cases, etc. Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.
I've found that well-written integration tests help me catch workflow-level issues (eg something changed in a dependency that might be mocked in unit tests).
So while I think good integration tests are the best way to make sure things should ship, I see a lot of value in good unit tests for day-to-day velocity, particularly in code that's being maintained or updated instead of new code.
This is what unit testing was originally described as. Which confirms my belief that unit testing and integration testing has always been the very same thing.
> Individual methods, especially ones that don't interact with data outside what is passed into them (functional methods), are good for unit testing.
Perhaps unit testing has come to mean this, but these kinds of tests are rarely ever worth writing, so it is questionable if it even needs a name. Sometimes it can be helpful to isolate a function like that for the sake of pinning down complex logic or edge cases, but is likely you'll want to delete this kind of test once you're done. This is where testing brittleness is born.
Integration tests are, in a way, worst of both worlds: they are more complicated than unit tests, they require involved setup, and yet they can’t really guarantee that things work in production.
End-to-end tests, meanwhile, do show whether things work or not. If something fails with an error, error reporting should be good enough in the first place to show you what exactly is wrong. If something failed without an error but you know it failed, make it fail with an error first by writing another test case. If there was an error but error reporting somehow doesn’t capture it, you have a bigger problem than tests.
At the end of the day, you want certainty that you deliver working software. If it’s too difficult to identify the failure, improve your error reporting system. Giving up that certainty because your error reporting is not good enough seems like a bad tradeoff.
Incidentally, grug-friendly e2e tests absolutely exist: just take your software, exactly as it’s normally built, and run a script that uses it like it would be used in production. This gives you a good enough guarantee that it works. If there is no script, just do it yourself, go through a checklist, write a script later. It doesn’t get more grug than that.
- Unit test = my code works
- Functional test = my design works
- Integration test = my code is using your 3rd party stuff correctly (databases, etc)
- Factory Acceptance Test = my system works
- Site Acceptance Test = your code sucks, this totally isn't what I asked for!?!
Then there's more "concern oriented" groupings, like "regression tests", which could fall into any number of the above.
That being said, there's a pretty wide set of opinions on the topic, and that doesn't really seem to change over time.
> these kinds of tests are rarely ever worth writing
I strongly disagree. I find it very helpful to write unit tests for specific implementations of things (like a specific sort, to make sure it works correctly with the various edge cases). Do they get discarded if you completely change the implementation? Sure. But that doesn't detract from the fact that they help make sure the current implementation works the way I say it does.
Unit and integration tests test different layers of the system, and one isn't inherently better or more useful than the other. They complement each other to cover behavior that is impossible to test otherwise. You can't test low-level functionality in integration tests, just as you can't test high-level functionality in unit tests.
There's nothing dogmatic about that statement. If you disagree with it, that's your prerogative, but it's also my opinion that it is a mistake. It is a harmful mentality that makes code bases risky to change, and regressions more likely. So feel free to adopt it in your personal projects if you wish, but don't be surprised if you get push back on it when working in a team. Unless your teammates think the same, in which case, good luck to you all.
Sure, if you're only writing a small script, you might not need tests at all. But as soon as that program evolves into a system that interacts with other systems, you need to test each component in isolation, as well as how it interacts with other systems.
So this idea that unit tests are not useful is coming from a place of laziness. Some developers see it as a chore that slows them down, instead of seeing it as insurance that makes their life easier in the long run, while also ensuring the system works as intended at all layers.
You can have robust testing by combining the two. You can check that the whole thing runs end to end once, and then test all the little features / variations using integration tests.
That's what we do for XWiki.
https://dev.xwiki.org/xwiki/bin/view/Community/Testing/#HTes...
Sorting mightn't be the greatest example as sorting could quite reasonably be the entire program (i.e. a library).
But if you needed some kind of custom sort function to serve features within a greater application, you are already going to know that your sort function works correctly by virtue of the greater application working correctly. Testing the sort function in isolation is ultimately pointless.
As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done. The edge cases you discovered need to be moved up in the abstraction to the greater program anyway.
> it's also easy to manually test while developing during QA and UAT.
As I said in the original comment, e2e tests can definitely be manual. Invoke your CLI, curl your API, click around in GUI. That said, comprehensively testing it that way quickly becomes infeasible as your software grows.
The converse is not true, however. It's perfectly possible for individual components to "work" well, but to not do the right thing from a high level perspective. Say, one component provides a good fast quicksort function, but the other component requires a stable sort to work properly - each is OK in isolation, but you need an integration test to figure out the mistake.
Unit tests are typically good scaffolding. They allow you to test bits of your infrastructure as you're building it but before it's ready for integration into the larger project. But they give you realtively little assurance at the project level, and are not worth it unless you're pretty sure you're building the right thing in the first place.
So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
You should push your tests as far to the edge as possible but no further. If a test at the edge duplicates a test in the middle, delete the test in the middle. But if a test at the edge can't possibly account for everything you're going to bed a test in the middle.
For me, heavy tests implies end-to-end tests, because at that point you're interacting with the whole system including potentially a browser, and that's just going to be slow whichever way you look at it. But just accessing a database, or parsing and sending http requests doesn't have to be particularly slow, at least not compared to the speed at which I develop. I'd expect to be able to run hundreds of those sorts of tests in less than a second, which is fast enough for me.
That's because it is both confusing and inconsistent. In my experience, every company uses slightly different names for different types of tests. Unit tests are generally fairly well understood as testing the single unit (a method/function) but after that things get murky fast.
For example, integration tests as reflected by the confused conversation in this thread already has wildly different definitions depending on who you ask.
For example, someone might interpret them as "unit integration tests" where it reflects a test that tests a class, builder, etc. Basically something where a few units are combined. But, in some companies I have seen these being called "component tests".
Then there is the word "functional tests" which in some companies means the same as "manual tests done by QA" but for others simply means automated front-end tests. But in yet other companies those automated tests are called end 2 end tests.
What's interesting to me when viewing these online discussions is the complete lack of awareness people display about this.
You will see people very confidently say that "test X should by done in such and such way" in response to someone where it is very clear they are actually talking about different types of tests.
No, that is not guaranteed.
Integration and E2E tests can only cover certain code paths, because they depend on the input and output from other systems (frontend, databases, etc.). This I/O might be crafted in ways that never trigger a failure scenario or expose a bug within the lower-level components. This doesn't mean that the issue doesn't exist—it just means that you're not seeing it.
Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them, which means they will not have full coverage of the underlying components. Another issue is that often these tests, particularly E2E and acceptance tests, are written only with a happy path in mind, and ignore the myriad of input that might trigger a failure in the real world.
Another problem with your argument is that it ignores that tests have different audiences. E2E and acceptance tests are written for the end user; integration tests are written for system integrators and operators; and unit tests are written for users of the API, which includes the author and other programmers. If you disregard one set of tests, you are disregarding that audience.
To a programmer and maintainer of the software, E2E and acceptance tests have little value. They might not use the software at all. What they do care about is that the function, method, object, module, or package, does what says on the tin; that it returns the correct output when given a specific input; that it's performant, efficient, well documented, and so on. These users matter because they are the ones who will maintain the software in the long run.
So thinking that unit tests are useless because they're a chore to maintain is a very shortsighted mentality. Instead, it's more beneficial to see them as guardrails that make your future work easier, by giving you the confidence that you're not inadvertently breaking an API contract whenever you make a change, even when all higher-level tests remain green across the board.
In the ideal world maybe. But It's very hard to test edge cases of a sorting algorithm with integration test. In general my experience is that algorithms and some complex but pure functions are worth writing unit tests for. CRUD app boilerplate is not.
My unit tests test things that must not work
Integration tests will cover every use case unit tests are supposed to cover if you actually tests the system behavior.
If it matters, why can't you check? Will your product/app/system not run into these possible sets eventually?
> So ultimately we write tests at a lower level to deal with the combinatorial explosion of possible inputs at the edge.
Don't you have to write the combinatorial explosion of inputs for the unit tests, too, to test "every possible combination"? If not, and you're only testing a subset, then why not test the whole flow while you're at it?
You mean just like unit tests where every useful interaction between units is mocked out of existence?
> Furthermore, the fact that, by their nature, integration and E2E tests are often more expensive to setup and run, there will be fewer of them
And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought. Every test harness and library is geared towards unit testing, and unit testing only.
It would be nice to fully test system behaviour, but to do so would have bankrupted the company long before even coming close.
So you have unit tests testing things in isolation? Did you not test how they work together? Did you never run your system to see it actually works and behaves as expected? You just YOLO'd it over to customers and prayed?
Now, you could create hundreds of different integration tests for each branch of the computation..., most of which will assert the same final output state, but achieved through different transitions
Or you can make some integration tests which make sure the logic itself is being called, and then only unittest the specific criteria in isolation.
What you're talking about is likely founded in either frontend testing (component tests vs unittest) or backends which have generally pretty trivial logic complexity. In these cases, just doing an integration test gets it done for the most part, but as soon as you got multiple stakeholders giving you sperate requirements and the consumed inputs get bigger and multiply ... Testing via integration tests gets essentially impossible to do in practice
I feel like I don't write enough tests, and when I do they're usually integration tests, but some things - algorithms, complex but pure functions, data structures - absolutely deserve their unit tests that can't be reasonably replaced by integration/e2e tests.
If you’re, for example, writing a web application, and you have an endpoint which parses some data from the request and then responds with the result of that computation, why the hell would you test the fine-grained behaviour of your parser by emulating HTTP requests against your server?
Testing the parsing function in isolation is orders of magnitude cheaper.
Sure, that is a risk. But not all unit tests require mocking or stubbing. There may be plenty of pure functions that are worth testing.
Writing good tests requires care and effort, like any other code, regardless of the test type.
> And that's the main issue: people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought.
Huh? Who is saying this?
The argument is coming from the other side with the claim that unit tests don't matter. Everyone arguing against this is saying that, no, all tests matter. (Let's not devolve into politics... :))
The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility. Higher-level tests can cover much more code than lower-level ones. In projects that keep track of code coverage, it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable. It simply means that they have a larger reach by their nature.
These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on. It is often impractical to rely on them during development, since they slow down the write-test loop. Instead, running the full unit test suite and a select couple of integration and E2E tests can serve as a quick sanity check, while the entire test suite runs in CI.
Conversely, achieving >50% of line or statement coverage with unit tests alone also doesn't mean that the software works as it should when it interacts with other systems, or the end user.
So, again, all test types are important and useful in their own way, and help ensure that the software doesn't regress.
"How were you making sure that your system actually works?"
Good design and good software engineering goes a long way.
When you know that you cannot test by simply doing everything the customer will do, you have to think about what tests you can do that will indicate how the system will operate under a load that's orders of magnitude greater than what you can do yourself, with hardware you've never even seen. You have to think about how to write software that is likely to be high quality even if you can't test it how you'd like to.
For example, one can design the architecture in such a way that adding more load, more devices, will only linearly increase the demands on resources, and then from testing infer what the loads will be on actual customer sites. Any non-linearity in those regard were identified, if not at the design stage, in the unit-testing thereof.
One can design the code in such a way that the internal mechanisms of how devices work are suitably abstracted away, leaving as best one can manage common interfaces, and then rather than have to test with the exact arrangements of hardware customers will have, test with devices that to the extent possible, simulate the interactions our software will see. In this regard, it turns out that many devices that purport to meet standard protocols actually meet "variations about the theme" of protocols. But this too can be mitigated and handled to a degree by careful design and thought in the software engineering. The learning from doing this with one set of devices and protocols spreads to significantly different devices and protocols; every subsequent fresh design for the next iteration or generation of hardware is better and more resilient. A software engineering organisation that learns and retains knowledge and experience can go long way.
One can recognise that running live on customers sites is itself an opportunity. Some customers would never say a thing, for years on end. Some would want to be involved and would regularly talk about things they'd seen, unexpected things that happened, loads and events and so on; one can ensure that all that information is gathered by the sales people, the support reps, anyone and everyone who talks to the customers, and passed back effectively to have the results of that testing applied. For doomsday scenarios, such as crashes, resource exhaustion, pathological behaviours and so on; good logging and live measurements and dump catching etc can at least feed back so that this situation (which we would never be able to truly test ourselves) is not wasted, and gets fixed, and the lessons of it applied forwards into design and development. Harsh for the customer who finds an issue, but great for the hundred customers who will never hit it because it was tested by that unlucky customer. We'd be fools not to gather as much information as we could from poor customer experiences.
One can get hold of cheap, twenty year old devices that in theory match the same protocol, and go to town on them (some customers will actually be using that exact device and contemporaries - some customers will have brand new hardware that costs a tenth of my employer's market cap). From that, get an idea of how the software performs. Get another cheap device from ebay that is a decade old, and test it; see where it fails, but don't just fix those failures. From them, and similar repeats of the process, learn at a more fundamental level how devices differ and develop more general solutions that will either then be resilient to some new piece of hardware that hasn't even been made yet, or at least will not go wrong in such a way that the whole system is taken out and the poorly-supported brand new hardware is clearly seen by the software and reported on.
There's more. There's so much more, but once you have no choice but to come up with cheap, fast testing that nonetheless give a good indication of how the system will work when someone spends tens of millions on the hardware, software engineers can really come up with some smart, reliable ideas. It can also be really fun and satisfying to work on this.
"You just YOLO'd it over to customers and prayed?"
Absolutely not. It was all tested, repeatedly, over and over, and over the course of about fifteen year became remarkably resilient, adaptable, resource light, and so on. All the good things one would hope for. In a pinch, a small system could be run from someone's laptop; at the top end, banks and banks of servers with their fans banshee wailing 24 hours a day, with dozens of the principal processes (i.e. the main executable that runs) all running, all talking to each other across countries and time zones, handling their own redundancy against individual processes turning off. Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.
Unit tests don't matter when you have other types of testing like functional or integration testing that will tell you whether your code has the intended behavior and effect when run.
In the above statement unit tests is also considered as code.
That's where the redundancy comes from.
I have 100% seen bugs that cancel each other out; code that's just plain wrong at the lower level, coming together by chance to work at the higher level such that one or more integration tests pass. When one piece of that lower level code then gets fixed, either deliberately or because of a library update or hardware improvement or some other change that should have nothing to do with the functionality, and the top level integration tests starts failing, it can be so painful to figure it out.
I've also seen bugs that cancel either other out to make one integration test pass, but don't cancel each other out such that other integration tests fail. That can be a mindmelt; surely if THIS test works, then ALL THIS low level code must be correct, but simultaneously if THAT test fails, then ALL THIS low level code is NOT correct. At which point, people start wishing they had lower level tests.
I only recently started looking into Quickcheck style libraries in the typescript world, and fast-check is fantastic. Like super high quality. Great support for shrinking in all sorts of cases, very well typed, etc.
Hooking fast-check up to a real database/redis instance has been incredible for finding bugs. Pair it up with some regular ol case by case integration tests for some seriously robust typescript!
The bias most developers have towards integration tests reflects the fact that even though we're often interviewed on it, it's quite rare that most developers actually have to write complex algorithms.
It's one of the ironies of the profession.
In fact, when I first saw Kent Beck's definition I did a double take because it covered what I would have called hermetic end to end tests.
The industry badly needs new words because it's barely possible to have a coherent conversation within the confines of the current terminology.
With other types of bug programmers want to fix it. With flakiness they either want to rerun the test until it passes or tear it down and write an entirely different type of test - as if it is in fact not a bug, but some immutable fact of life.
Integrated tests is more about ensuring what matters to Product. A car that refuses to start is worthless for most cases. But the engine light and a window that can’t open is not usually a dealbreaker.
Unit tests can help pinpoint an issue or ensure that a specs is implemented. But that’s mostly relevant to the developer world. So for a proper DX, add unit tests to help pinpoint bugs faster, especially with code that doesn’t change as much and where knowledge can be lost.
What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
I think of it like building any other large machine with many inputs. You can't possibly test a car under every conceivable condition. Imagine if someone was like "but wait, did you even test going round a corner at 60mph in the wet with the radio on?!"
I'd say that works and works correctly and covers all edge cases are different scenarios in my mind. Looking at an exaggerated example, if I build tax calculator or something that crunches numbers, I'd have more confidence with a few unit tests matching the output of the main method that does the calculation part than a whole end-to-end test suite. It seems wasteful to run end to end (login, click buttons, check that a UI element appears, etc) to cover the logical output of one part that does the serious business logic. A simple e2e suite could be useful to check for regressions, as a smoke test, but it still needs to be kept less specific, otherwise it will break on minor UX changes, which makes it a pain to maintain.
Sure, you wouldn’t have all possible datasets and scenarios, but you can easily have a few, so that e2e test fails if results don’t make sense.
Of course, unit tests for your business logic make sense in this case. Ideally, you would express tax calculation rules as a declarative dataset and take care of that one function that applies these rules to data; if the rules are wrong, that is now a concern for the legal subject matter experts, not a bug in the app that you would need to bother writing unit tests for.
However, your unit test passing is just not a signal you can use for “ship it”. It is a development aid (hence the requirement for them to be fast). Meanwhile, an e2e test is that signal. It is not meant to be fast, but then when it comes to a release things can wait a few minutes.
Instead - instead, there really is an instead here - you can call at a higher level which is less brittle to refactoring, has less complex setup, doesn't involve mocks that may not behave like the real thing, but still runs quickly due to fakes stubbing out expensive dependencies.
Maybe FooSystem will be redesigned to take different inputs,maybe the upstream will change to provide different outputs, maybe responsibility will shift around due to changes in the number of dependencies and it makes sense to vertically integrate some prep to upstream to share it.
Unit tests in these circumstances - and they're the majority of unit tests, IME - can act as a drag on the quality of the system. It's better to test things like this at a component level instead of units.
That said, I think it takes a real knack to figure out the right sort of tests, and it sometimes takes me a couple of attempts to get it right. In that case, being willing to delete or completely rewrite tests that just aren't being useful is important!
I find the problem with trying to move the tests up a level of abstraction is that eventually the code you're writing is probably going to change, and the tests that were useful for development the first time round will probably continue to be useful the second time round as well. So keeping them in place, even if they're really implementation-specific, is useful for as long as that implementation exists. (Of course, if the implementation changes for one with different edge cases, then you should probably get rid of the tests that were only useful for the old implementation.)
Importantly, this only works if the boundaries of the unit are fairly well-defined. If you're implementing a whole new sort algorithm, that's probably the case. But if I was just writing a function that compares two operands, that could be passed to a built-in sort function, I might look to see if there's a better level of abstraction to test at, because I can imagine the use of that compare function being something that changes a lot during refactorings.
Ideally your units/integrations will never change. If they do change, that means the users of your code will face breakage and that's not good citizenry. Life is messy and sometimes you have little choice, but such changes should be as rare as possible.
What is actually likely to change is the little helper functions you create to support the units, like said bespoke sort function. This is where testing can quickly make code fragile and is ultimately unnecessary. If the sort function is more useful than just a helper then you will move it out into its own library and, like before, the sort function will become the entire program and thus the full integration.
If you are concerned that the ORM won't behave as it claims to, you can write tests targeted at it directly. You can then run the same tests against your mock implementation to show that it conforms to the same contract.
But an ORM of any decent quality will already be well tested and shouldn't do unexpected things, so perhaps the worry is for not?
I think this is what you're saying about moving useful units out into their own library. I agree, and I think it sounds like we'd draw the testing boundaries in similar places, but I don't think it's necessary to move these sorts of units into separate libraries for them to be isolated modules that can be usefully tested.
The sort function is one of the edge cases where how I'd test it would probably depend a lot on the context, but in theory a generic sort function has a very standard interface that I wouldn't expect to change much, if at all. So I'd be quite happy treating it as a unit in its own right and writing a bunch of tests for it. But if it's something really implementation-specific that depends on the exact structure of the thing it's sorting, then it's probably better tested in context. But I'm quite willing to write tests for little helper functions that I'm sure will be quite stable.
The whole of the interface is the unit, as Beck originally defined it. As it is the integration point. Hence why there is no difference between them.
> And most of the units you're writing are probably internal-facing
No. As before, it is a mistake to test internal functions. They are just an implementation detail. I understand that some have taken unit test to mean this, but I posit that as it is foolish to do it, there is no need to talk about it, allowing unit test to refer to its original and much more sensible definition. It only serves to confuse people into writing useless, brittle tests.
> So I'd be quite happy treating it as a unit in its own right
Right, and, likewise, you'd put it in its own package in its own right so that it is available to all sort cases you have. Thus, it is really its own program — and thus would have its own tests.
Sure, yeah, I think we're saying the same thing. A unit is a chunk of code that can act as its own program or library - it has an interface that will remain fairly fixed, and an implementation that could change over time. (Or, a unit is the interface that contains this chunk of code - I don't think the difference between these two definitions is so important here.) You could pull it out into its own library, or you can keep it as a module/file/class/function in a larger piece of software, but it is a self-contained unit.
I think the important thing that I was trying to get across earlier, though, is that this unit can contain other units. At the most maximal scale, the entire application is a single unit made up of multiple sub-units. This is why I think a definition of unit/integration test that is based on whether a unit integrates other units doesn't really make much sense, because it doesn't actually change how you test the code. You still want quick, isolated tests, you still want to test the interface and not the internals (although you should be guided by the internals), and you still want to avoid mocking. So distinguishing between unit tests and integration tests in this way isn't particularly useful.
So `BankAccount` as a class is probably a useful unit boundary: once you've designed the class, you're probably not going to change the interface much, except for possibly adding new methods occasionally. You have a stable boundary there, where in theory you could completely rewrite the internals of the class but the external boundary will stay the same.
`FooSystemFrobnicatorPreparer` sounds much more like an internal detail of some other system, I agree, and its interface could easily be rewritten or the class removed entirely if we decide to prepare our frobnication in a different way. But in that case, maybe the `foo.system.frobnicator` package is the unit we want to test as a whole, rather than one specific internal class inside that package.
I think a lot of good test and system design is finding these natural fault lines where it's possible to create a relatively stable interface that can hide internal implementation details.
I don't think people realize the stats based on studies:
Unit testing catches about 30% of bugs
Visual code inspection catches about 70%
End to end testing also catches about 70%
All are important, but emphasis should be on the more effective methods.
> We’ve had decades of thought leadership around testing
I really disagree that this is the case that the industry has had thought leadership. I think we've had people pushing automated unit testing very hard when end to end is more effective. I don't think the position was based on facts but more of a few people's opinion.
Not all integrations require mocking or stubbing either. Yet somehow your argument against integration tests is that they somehow won't trigger failure scenarios.
> The argument is coming from the other side with the claim that unit tests don't matter.
My argument is that the absolute vast majority of unit tests are redundant and not required.
> The idea of the test pyramid has nothing to do with one type of test being more important than another. It's simply a matter of practicality and utility.
You're sort of implying that all tests are of equal importance, but that is not the case. Unit tests are the worst of all tests, and provide very little value in comparison to most other tests, and especially in comparison to how many unit tests you have to write.
> it's not unheard of for a few E2E and integration tests to cover a large percentage of the code base, e.g. >50% of lines or statements. This doesn't mean that these tests are more valuable.
So, a single E2E tests a scenario that covers >50% of code. This is somehow "not valuable" despite the fact that you'd often need up to a magnitude more unit tests covering the same code paths for that same scenario (and without any guarantees that the units tested actually work correctly with each other).
What you've shown, instead, is that E2E tests are significantly more valuable than unit tests.
However, true, E2E tests are often difficult to set up and run. That's why there's a middle ground: integration tests. You mock/stub out any external calls (file systems, API calls, databases), but you test your entire system using only exposed APIs/interfaces/capabilities.
> These tests also require more boilerplate to setup, external system dependencies, they take more time to run, and so on.
And the only reason for that is this: "people pretend that only unit tests matter, and as a result all other forms of testing are an afterthought." It shouldn't be difficult to test your system/app using it the way your users will use, but it always is. It shouldn't be able to mock/stub external access, but it always is.
That's why instead of writing a single integration test that tests a scenario across multiple units at once (at the same time testing that all units actually work with each other), you end up writing dozens of useless unit tests that test every single unit in isolation, and you often don't even know if they are glued together correctly until you get a weird error at 3 AM.
> It was all tested, repeatedly, over and over, and over the course of about fifteen year
So, you do test how your system actually works, and not just isolated unit tests.
> Again, when you begin knowing that the software has to deliver on such a range of systems, where one customer is two college kids in a basement and one customer is valued in the tens of billions (although doing a lot more, of course, than just what our software let them do), design and good software engineering goes a very long way.
Indeed. And that good engineering would include a simple wisdom "unit tests are useless without integration and E2E tests, otherwise you wouldn't be able to run your software anywhere because units just wouldn't fit together".
And once you have proper integration tests, 99%+ of unit tests become redundant.
But you can with unit tests?
> Can you test the Python parser on all possible Python programs?
A parser is one of the few cases where unit tests work. Very few people write parsers.
See also my sibling reply here: https://news.ycombinator.com/item?id=45078047
> What you do is write more primitive components and either unit test them, prove them to be correct or make them small enough to be correct by inspection. An integration test is just testing that the interfaces do indeed fit together, it won't normally be close to testing all possible code paths internally.
Ah yes. Somehow "behaviour of unit tests is correct" but "just testing interfaces in just a few integration tests". Funny how that becomes a PagerDuty alert at 3 in the morning because "correct behaviour" in one unit wasn't tested together with "correct behaviour" in another unit.
But when you actually write an actual integration test over actual (or simulated) inputs, suddenly 99%+ of your unit tests become redundant because actually using your app/system as intended covers most of the code paths you could possibly use.
Assuming by mock you mean an alternate implementation (e.g. an in-memory database repository) that relieves dependence on a service that is outside of immediate control, nah. There is no reason to avoid that. That's just an implementation detail and, as before, your tests shouldn't be bothered by implementation details. And since you can run your 'mock' against the same test suite as the 'real thing', you know that it fulfills the same contract as the 'real thing'. Mocks in that sense are also useful outside of testing.
If you mean something more like what is more commonly known as a stub, still no. This is essential for injecting failure states. You don't want to have to actually crash your hard drive to test your code under a hard drive crash condition. Testing failure cases are the most important tests you will write, so you will definitely be using these in all but the simplest programs.
Integration tests test that your system works. Testing how a system works covers the absolute vast majority of functionality you'd test with unit tests because you will hit the same code paths, and test the same behaviours you'd do with unit tests, and not in isolation.
This is a joke, but it's not: https://i.sstatic.net/yHGn1.gif
Yes, you can exercise the same code paths with integrated tests as you might with unit tests. There are multiple approaches to driving integrated tests, from the relatively inexpensive approach of emulating a HTTP env, to something more expensive and brittle like Selenium. You could also just test everything with manual QA. Literally pay some humans to click through your application following a defined path and asserting outcomes. Every time you make a change.
Obviously all of these have different costs. And obviously, testing a pure function with unit tests (whether example based or property based) is going to be cheaper than testing the behaviour of that same function while incidentally testing how it integrates with its collaborators.
Unit tests work well for well-defined, contained units and library-like code.
E.g. you have code that calculates royalties based on criteria. You can and should test code like that with unit tests (better still, with property-based testing if possible)
Such code is in a tiny minority.
What you really want to do, is test that your system behaves as advertised. E.g. that if your API is called with param=1 it returns { a: "hello" }, and when with param=-1, it returns HTTP 401 or something.
The best way to do that is, of course E2E tests, but those are often quite difficult to set up (you need databases, external services, file systems etc.)
So you go for the middle ground: integration tests. Mock/stub unavailable external services. Test your full code flow. With one test you're likely to hit code paths that would require multiple unit tests to test, for a single scenario. You'll quickly find that easily 99%+ of your unit tests are absolutely redundant.
---
Offtop/rant/sidetrack.
This is especially egregious in "by the book" Java code. You'd have your controller that hits a service that collects data from facades that each hit external services via some modules. Each of those are tested in unit tests mocking the living daylight out of everything.
So for param=1 you'd have a unit test for controller (service mocked), service (facades mocked), each of the facades (if there are more than one, external services modules mocked), each of the external service modules (actual external services mocked).
Replace that with a single integration test where just the external service is mocked, and boom, you've covered all of those tests, and can trivially expand it to test external service being unavailable, timing out, returning invalid data etc.
How does my example not test real world behaviour? I mean, I didn’t even provide any code here so what exactly are you imagining?
Why wouldn’t you test parsers in isolation?
Why is it a blunder? Well, you just slowed down your edit-compile-run cycle by about 10x, and debugging when things go wrong (and it's when, not if) by 100-1000 times depending on the complexity of your environment.
Perhaps the answer is "AI will fix it", but we aren't there yet.
The failure mode I see much more often is in the other direction: tests that are testing too many units together and need to be lowered down to be more useful. For example, I recently wrote some code that generated intellisense suggestions for a DSL that our users use. Originally, the tests covered a large swathe of that functionality, and involved triggering e.g. lots of keydown events to check what happened when different keys were pressed. These were useful tests for checking that the suggestions box worked as expected, but they made it very difficult to test edge cases in how the suggestions were generated because the code needed to set that stuff up was so involved.
In the end what I did was I lowered the tests so I had a bunch of tests due the suggestions generation function (which was essentially `(input: str, cursor: int) -> Completion[]` and so super easy to test), and a bunch of tests for the suggestions box (which was now decoupled from the suggestions logic, and so also easier to test). I kept some higher level integration tests, but only very few of them. The result is faster, but also much easier to maintain, with tests that are easier to write and code that's easier to refactor.
I think most people are largely on the same page, but idk. Anyway what i mean is that as long as your unit tests are very narrowly scoped and testing specific chunks of logic, they really shouldn't break very often. I feel like my current project has done a decent job of this. Our unit tests rarely break, but to be fair, our integration tests fail too often.
I'm a UI developer, I've seen a lot of "unit" tests in PRs that go too far beyond testing specific logic and end up being brittle and not very useful.
> Ideally any changes are manually tested before releasing, and a bug in one part of the app that's being worked on is not likely to affect a different section, e.g. not necessary to retest the password reset flow if you're working on the home dashboard
That is one can of worms. First, during normal development work it is very common to modify some part that affects multiple parts of the app. In fact, it is inhuman to know exactly what it affects in a big app (ergo, testing). Second, while manual testing is a kind of e2e testing, it is not feasible in a bigger application.
> usually flaky end-to-end tests
Then make them not flaky. It’s amazing what can happen if something stops being treated as an afterthought!
We have developer sandboxes for this purpose so you don't have to guess the structure of the data you will receive from the actual server.
Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.
No one actually evaluates whether unit tests are needed.
Unit tests at least in my experience, are needed sparingly - in specific places that encompass slightly complicated well contained logic.
This is the kind of dogmatism I want people to understand. I’m not saying unit tests are useless but they have very narrow use, in units that encompass slightly complicated logic. Most of us write classes that just have a few for loops, if conditions, metrics and a few transformations. The overhead of writing a unit tests where, mocking all external services and continuously maintaining them when every small code change causes unit tests to break (false positives) is pretty high.
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
The point is, there is typically more than one path through the logic of a parser. The cheapest way to test these paths is in isolation. If there are five paths you care about, you could write six integrated tests — one for each of the five paths you care about, and one to verify that your parser is correctly integrated with your system, or of course you could write five isolated tests (which are cheaper to write and cheaper to execute) and one integrated test.
So, five cheap tests and one more expensive test, or six more expensive tests.
> Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.
…What? I'm sorry, this is near enough unintelligible.
You already said so in your first argument: Unit tests are cheaper and better(than integrated tests I presume).
Am simply following your behavior pattern here.
> The point is, there is typically more than one path through the logic of a parser. The cheapest way to test these paths is in isolation. If there are five paths you care about, you could write six integrated tests — one for each of the five paths you care about, and one to verify that your parser is correctly integrated with your system, or of course you could write five isolated tests (which are cheaper to write and cheaper to execute) and one integrated test.
This is nonsense. A standard parser takes one input and does processing of this data to give an expected output. An integration test checks the parser does this one objective correctly. You have boiled down the 5 unit tests that don't test for anything *real into 1 integrated test that objectively gives better test data.
*code is not real until it does some business logic!
> Also if all else was equal between multiple types of test, there wouldn't be need for comparison ala cheaper.
> …What? I'm sorry, this is near enough unintelligible.
Maybe try to froth less when reading my comments, your brain might have some capacity left to understand comparitive adjectives.
This appears to be the root of your confusion.
Here's a good example of a parser: https://entropicthoughts.com/parser-combinators-parsing-for-...
There are at least 16 paths through this function.
Not one.
16.
You're describing code. At what point does code become "worthy" of a unit test? How do you communicate this to your team members? This type of ambiguity introduces friction and endless discussions in code reviews, to the point that abiding to the convention that all code should be unit tested whenever possible is a saner long-term strategy. This doesn't have to be a strict rule, but it makes sense as a general convention. Besides, these days with LLMs, writing and maintaining unit tests doesn't have to be a chore anymore. It's one thing the tech is actually reasonably good at.
What I think we fundamentally disagree about is the value of unit tests. That small function with a few for loops and if conditions still has users, which at the end of the day might be only yourself. You can't be sure that it's working as intended without calling it. You can do this either manually; automatically by the adjacent code that calls it, whether that's within an integration/E2E test or in production; or with automated unit tests. Out of those options, automated unit tests are the ones that provide the highest degree of confidence, since you have direct control over its inputs and visibility of its outputs. Everything else has varying degrees of uncertainty, which carries a chance of exposing an issue to end users.
Now, you might be fine with that uncertainty, especially if you're working on a solo project. But this doesn't mean that there's no value in having extensive coverage from unit tests. It just means that you're willing to accept a certain level of uncertainty, willing to tradeoff confidence for convenience of not having to write and maintain code that you personally don't find valuable, and willing to accept the risk of exposing issues to end users.
integration tests don't need to be slow on modern hardware, are easier to debug than end-to-end if they are kept at the right level of abstraction and catch more real-world bugs than overly-specialized unit tests w/ complicated mocks, etc
"How do you "design a good system" without testing it?"
We must be coming at this through such widly different contexts. To me, it is simply obvious and normal that it's possible to create a good design for something, and that good design can exist before any tests have ever been created or executed.
To me, that you asks that question suggests that we have such different contexts that we might as well be speaking different languages. I would be horrified that people would churn out a rubbish design and just let tests handle all the crap and force it into a good design; but I do gather that's normal procedure in some industries.
How to see if someone is arguing in bad faith? Well, they pretend that reductio ad absurdum is a valid argument
> Obviously all of these have different costs. And obviously, testing a pure function with unit tests (whether example based or property based) is going to be cheaper than testing the behaviour of that same function while incidentally testing how it integrates with its collaborators.
Let's see. A single scenario in an integration test:
- tests multiple code paths removing the need for multiple unit tests along that code path
- tests externally observable behaviour of the app/api/system is according to spec/docs
- tests that all units (that would otherwise be tested in isolation from each other) actually work together
This is obviously cheaper. Programmer (the expensive part) has to write less code, the system doesn't suddenly break because someone didn't wire units together (the insanely expensive part, 'cause everything was mocked in tests; unironically true story that hammered the final nail in the coffin of unit tests for me).
By the way, here's what Kent Beck has to say about unit tests: https://stackoverflow.com/a/153565
--- start quote ---
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence
--- end quote ---
https://news.ycombinator.com/item?id=45081378
> By the way, here's what Kent Beck has to say about unit tests
As I pointed out to you earlier, I've been doing TDD for a long time. I'm already plenty familiar with Kent Beck's writing.
---
I'm not convinced that you actually know what you're talking about. You've contradicted yourself a number of times when responding to me and to others. You construct straw men to argue against (who said everything needs to mocked in unit tests?). You've said "very few people write parsers", which is utter nonsense — parsing, whether you realise it or not, is one of the most common things you'll do as a working programmer. You've insisted that unit tests don't actually test that something works. You've created this false dichotomy where one has to choose between either isolated tests or integrated tests.
All I can say is good luck to you mate.
It is entirely possible for a sort function to be just one component of the functionality of the larger code base. Sort in specific is something I've written unit tests for.
> As before, there may be some benefit in writing code to run that sort function in isolation during development to help pinpoint what edge cases need to be considered, but there isn't any real value in keeping that around after development is done.
Those edge cases (and normal cases) continue to exist after the code is written. And if you find a new edge case later and need to change the code, then having the previous unit tests in place gives a certain amount of confidence that your changes (for the new case) aren't breaking anything. Generally, the only time I _remove_ unit tests is if I'm changing to a new implementation; when the method being tested no longer exists.