Combinatorial tests are helpful to cover a lot of various input data combinations, but their out-of-the box support in testing frameworks can easily bloat your continuous integration server times. Not necessarily...
Introduction
I love unit tests. They are awesome, quick, reliable, isolated, easy to read and easy to write. And if your unit tests are not - you still have a space to improve. But this article is not about unit testing and its goodness, it is rather about combinatorial unit testing and how to cover more with less efforts.
Background
Within this article, I assume you are familiar with concepts of lamdas and anonymous types in C#, as well as you are confident with enumerable and enumerators. I also assume you have a certain experience in unit testing. I'm referring to NUnit as a testing framework just to describe some of the technical aspects of unit testing, as well as it was used to perform some internal assertions that are easily replaceable by almost any other assertion framework like Shouldly or FluentAssertions.
Combinatorial Tests are Important
So what are combinatorial tests? Combinatorial tests are those tests that provide test cases for all possible combinations of the individual data items provided for the parameters of a test. To rephrase this, these are tests to verify the outcome regardless of the combination of the given data. These tests are frequently used to ensure that there is no correlation between the provided arguments and behavior is consistent. They are quite useful, e.g., to verify that there is no certain logic around string
s (null
/empty/white space/string
with human-readable text/string
with trailing spaces, etc.) or, for example, to verify serialization roundtrip - when your Data Transfer Object should not be perfectly serializable and deserializable, back and forth, especially when you use custom serialization engine, which would require helper attributes (e.g., protobuf-net with ProtoMember
attributes), where it is so easy to miss something.
Combinatorial Tests are Pain
Now imagine the situation where you would like to test this kind of constructor:
public void SomeConstructor(string stringArg, long longArg, double doubleArg)
- For
string
argument, I would test at least: null
, empty string
, white space, non-white space; - For
long
argument, I would test at least: long.MinValue
, long.MaxValue,
-1L
, 0L
, 1L
; - For double argument, I would test at least:
double.MinValue
, double.MaxValue
, -1.0d
, 0.0d
, 1.0d
;
This gives me a (4 string
combinations) x (5 long
combinations) x (5 double
combinations) = 100 combinations. Not really a heavy load if we would run them in for
/foreach
-loop, and even in parallel, however having this introduced as e.g., TestCaseSource
for NUnit, it will generate a 100 of test cases, and each of them will add a significant extra management overhead:
- NUnit will have to generate all these test cases, which will be wrapped in
TestCaseData
; - For every test, it will have to call
SetUp
and TearDown
; - And every test will be executed sequentially;
Situation will get worse quickly as the number of potential values increases - from practical experiments, a 100K test cases would make NUnit to "prepare" for several minutes.
So I started looking for ways to have the same test cases described in a primitive, short way, and have these test cases created nearly instantaneously.
Theory
I identified the following goals:
- Combinatorial unit test should be test framework-agnostic. That means that I should not extend any specific framework functionality, by example, implementing custom attributes/interfaces.
- Combinatorial unit test should be self descriptive. That means that I should naturally read the combinations and see the test itself, so I could quickly understand what the test is doing and what kind of test cases are considered as input data.
- It should be low-ceremony. Minimize the number of hiccups to get the stuff running. Description part should not be longer than the test part.
Once I abstracted myself from the implementation and started treating my code from the "client" perspective, I suggested a couple of syntactical constructions that could work. Thinking a little bit more, I decided to stop on the one like that:
Combinations
.Compose(x => new
{
Greeting = x.Only("Hello", "Howdy", "GDay"),
Participant = x.Only("John", "James", "Bob")
})
.RunInParallel(test =>
{
Console.WriteLine("{0}, {1}", test.Greeting, test.Participant);
});
This looked quite logical to me, there are two clearly separate parts:
The declaration part is exposed by Compose
method. This method expects a lambda that will describe the type-safe test case with values suggested for every parameter. Type safety is highly important during the refactoring, as it helps to ensure the type consistency between declarative part and executive part. So I was reading this as "Compose
test case as a combination of Greeting
parameter taking Only "Hello
", "Howdy
" and "GDay
", and Participant
parameter taking Only "John
", "James
" and "Bob
"".
The test part is exposed by RunInParallel
method. This method expect a lambda that will describe the test itself. The lambda provides test
argument that will give an access to a specific test case data. With the given example, test.Greeting
value should be either "Hello
", "Howdy
" or "GDay
" and test.Participant
should be either "John
", "James
" or "Bob
".
The declaration overhead is minimal, the only question is how to implement it.
Implementation
Compose
method provides an entity of a certain type which is used to describe the sequences. I call this entity a Combinator
- an entity that has a list of declared sequences and methods to populate those sequences. The Combinator
type was made public
to be accessible for end user, but it was declared sealed as I do not expect any inheritance, as well as its constructor was made internal assuming that the client shall not create instances of this type explicitly. The list of sequences is private
and the sequence itself is some sort of enumerable sequence.
public sealed class Combinator
{
private readonly List<IEnumerable> sequences = new List<IEnumerable>();
internal Combinator()
{
}
}
According to the example above, I expect that Combinator
will contain the Only method accepting a list of values representing a sequence of specific type. The return type of this method is used to define the property type in the anonymous class, so apparently the method should be generic. But what about return type, what value the method should return? It is not really important, as that value will never be used. What is really important is to add the given list of item as a sequence to a private collection of sequences. I also decided to adjust the method signature to require at least one item, and any number of extra items using param
s - this will prevent the invocations with no items (empty sequences).
public T Only<T>(T atLeastOne, params T[] orAnyNumberOfOther)
{
sequences.Add(new[] { atLeastOne }.Concat(orAnyNumberOfOther).ToArray());
return default(T);
}
To summarize the above, Combinator
is created by Compose
method and passed to its lambda, where it is used to declare and preserve sequences and identify the test case anonymous type with all properties having a correct type. The assumption being made here and further is that the order of sequence declarations and anonymous type properties order matches.
At this point, we have sequences as "flat" enumerable, however to perform the combinatory test, we will have to generate another "flat" sequence with all possible combinations. That is why Combinator
exposes yet another internal method called Yield
for that purpose:
internal IEnumerable<T> Yield<T>()
{
}
The complete implementation of this method is available in a source code attached to an article and quite long to put it here, but key highlights would be:
T
is an anonymous type. It is exactly the same anonymous type that is produced by Compose
method. In reality, anonymous types are compiler-generated types, so there will be specific "unnamed" type generated by compiler during the compile time, which will have a constructor accepting all values for all properties declared by anonymous type in the order of declaration. Keeping this in mind, it is quite easy to use activator to create instances of the anonymous type. - This method returns
IEnumerable<T>
, so we could utilize yield
keyword to generate instances on-demand. This fact reduces preparation overhead, especially within parallel run scenarios. - Original sequences are always enumerated at least once, that is why it is important to store them as arrays or collections (rather than other enumerables evaluated on-demand).
- An implementation heavily exploits enumerator instances of the given enumerables in sequence as they give power to quickly reset them or access the currently iterated values.
Once Yield
part is done and we have a final sequence enumerable, life gets significantly easier as we just need to iterate through the sequence and call the test method with the given combination. This could be done either sequentially using plain foreach
loop, or in parallel using, e.g., Parallel.ForEach
.
Extensibility
Provided structure is very open for extension. As an example of the extension, let's consider the example with string
, which is usually quite repetitive in combinatorial tests. When verifying a certain argument of a constructor or method parameter of type string
, developers tend to use helper methods like string.IsNullOrEmpty(...)
and string.IsNullOrWhiteSpace(...)
, which normally makes sense to verify with combinatorial tests as well. I will give the following example implementation for the reference:
public string NullEmptyAndWhiteSpace()
{
sequences.Add(new object[] { default(string), string.Empty, " ", "\t" });
return default(string);
}
The sequence is represented as null
, empty string
, single space string
and string
containing tab. From my practical experience, the case with tab is usually forgotten, however still has to be considered. Like with any other sequence declaration methods, the return value is irrelevant, but its type is not, that is why return type is string
and default(string)
is returned. For further exercise, try adding the sequence for double
s, and don't forget to include extreme cases like double.NaN
, double.PositiveInfinity
and double.Epsilon
.
History
- Version 1.0 - Initial publication
- Version 1.1 - Added source code repo URL