Our goal is to have every feature tested before being released.
While right now we’re pretty far from following the SQLite approach, we expect every developer to make sure that the feature they are working on is covered with at least basic tests.
Some options we use are:
Whenever possible, the new functionality should be isolated well enough for it to be tested independently. If testing the module seems to be a complex task, the author of new functionality is also expected to prepare the required test infrastructure to make unit-testing possible. In most cases, unit testing is a preferred way to test things, as unit tests are usually faster than other kinds of tests, and at the same time, they have less chance of being flaky.
Sometimes testing a single function of a module is either unreasonably hard (especially in Rust, given its limited capabilities with mocking), or doesn’t provide good enough coverage to make sure that combined functionality works well enough.
In such cases, module tests can be considered. Module tests often imply some kind of setup procedure, where you initialize both the module itself and its dependencies.
The test itself usually represents a set of inputs and expected outputs for a module as a whole.
This kind of test often assumes that the whole system is running and we’re running checks against it as a black box.
The upside of integration tests is that they can test the whole system, which may be convenient when your feature spans multiple modules.
The downsides are that integration tests are often a) much slower than unit or module tests, b) flaky, since it’s hard to account for all the possible states the system can enter, and c) shallow, since you’re interacting with the system using a very high-level API.
While it may be convenient to add an integration test for your new feature to an existing test suite, generally you are expected to first try to write a unit or module test, if it’s possible.
Fuzzing may help to check whether your system is robust against random and unexpected input. We use it, for example, to check the resilience of our VM implementation.
As an additional testing measure, we have a staging environment for our backend, and every merged change gets deployed there immediately. This environment receives an artificially created load that resembles an organic one, and also it has the same set of metrics and alerts as any real environment. If any PR breaks the staging environment, such a PR would normally be rolled back until the problem is fixed.
Unfortunately, sometimes writing tests is not possible. It is not a normal situation, but if you ever find yourself in it, make sure to test your code at least manually. Manual testing is not good, but it’s still better than no testing at all. After the feature is shipped, advocate for allocating resources to implement automated testing there.