Background
I think we all have had the experience where we felt we could do with a more sophisticated random generator than the bear-bone Random
class.
A couple of years back, when I started looking at data generation for the load test of one of my projects, I badly felt the need for a comprehensive Random generation framework, and I could not find any, so I started putting some code together. In our project, the data was being loaded to a huge OLTP database, and all previous data generation applications were inserting hard-coded values. Needless to say, the statistics on those indexes would look nothing like what we would be having in the real environment; hence, the load test, from the beginning, was flawed. So, I not only needed random data generation, I needed something that could also create more or less the same data frequency distribution, since if I used pure random data, indexes will have an almost even spread among all values.
First of all, I created a random generator for the main primitive types, and then a generic random generator that picks the items from a list, hence it could be used for any type. Then, I extended this type to allow for customised probability distribution. The result, I believe, is a simple but flexible and extensible framework which I hope will reduce your development time and enable you to customise and write your own RandomGens. This can be used for data generation, but even more for setting up Unit Tests with real-looking data.
Framework
All random generators are inherited from the RandomGeneratorBase<T>
class. This class has a member variable of the System.Random
type which is the cornerstone of all random generation. Without delving too deep into the details of System.Random
, this class requires a seed to start with. If you provide the same seed, you will always get the same chain of random numbers, i.e., it would not be random at all. Environment.TickCount
is a useful seed, but if you create similar random generators in the same section of the code, there is a good chance that they will all create similar random numbers because they were created at the same TickCount
. So, a random seed generation is already implemented in RandomGeneratorBase<T>
.
The primitive random generators include:
BooleanRandomGen
IntegerRandomGenerator
DoubleRandomGenerator
StringRandomGenerator
DateRandomGenerator
I have also included a few specialised random generators:
NameRandomGenerator
which generates names or words as strings
EnumRandomGenerator<T>
ListRandomGenerator<T>
PrioritisedListRandomGenerator<T>
Using these classes is pretty straightforward. This snippet is probably more useful than any documentation:
public static void Main(string[] args)
{
Console.WriteLine("Random generation framework");
Console.WriteLine("===========================");
Console.WriteLine();
Console.WriteLine();
Console.WriteLine("Boolean:");
BooleanRandonGen bg = new BooleanRandonGen();
for (int i = 0; i < 5; i++)
Console.WriteLine(bg.GetRandom());
Console.WriteLine();
Console.WriteLine("Integers between 10 and 100 (exclusive 100):");
IntegerRandomGenerator ig = new IntegerRandomGenerator(10, 100);
for (int i = 0; i < 5; i++)
Console.WriteLine(ig.GetRandom());
Console.WriteLine();
Console.WriteLine("Doubles between 10.0 and 100.0 (exclusive 100.0):");
DoubleRandomGenerator dg = new DoubleRandomGenerator(10, 100);
for (int i = 0; i < 5; i++)
Console.WriteLine(dg.GetRandom());
Console.WriteLine();
Console.WriteLine("Dates between 01/01/1969 and 28/02/2009:");
DateRandomGenerator tg =
new DateRandomGenerator(new DateTime(1969,1,1),
new DateTime(2009,2,28));
for (int i = 0; i < 5; i++)
Console.WriteLine(tg.GetRandom());
Console.WriteLine();
Console.WriteLine("String digits up to 10 chars " +
"(including 10 and could be zero length):");
StringRandomGenerator sg1 = new StringRandomGenerator(10,
CharacterType.Digit);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + sg1.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Letters between 10 to 50 chars (including 50):");
StringRandomGenerator sg2 = new StringRandomGenerator(10,50,
CharacterType.LowerCase |
CharacterType.UpperCase);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + sg2.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Letters between 10 to 20 chars padded (including 20):");
StringRandomGenerator sg3 = new StringRandomGenerator(10, 20,
CharacterType.LowerCase | CharacterType.UpperCase, true);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + sg3.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Random words:");
NameRandomGenerator ng1 = new NameRandomGenerator( NameType.Word);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + ng1.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Random male and female forenames:");
NameRandomGenerator ng2 = new NameRandomGenerator(
NameType.FemaleName | NameType.MaleName);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + ng2.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Random enum:");
EnumRandomGenerator<nametype> eg = new EnumRandomGenerator<nametype>();
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + eg.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Random from a list:");
ListRandomGenerator<string> lg1 = new ListRandomGenerator<string>(
new string[] { "John", "George", "Jorge", "Jose", "Jack", "Jimi" });
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + lg1.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("Unique random from a list:");
ListRandomGenerator<string> lg2 = new ListRandomGenerator<string>(
new string[] { "John", "George", "Jorge",
"Jose", "Jack", "Jimi" }, true, true);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + lg2.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("random from a prioritised list " +
"with random scores (max=1000):");
PrioritisedListRandomGenerator<string> pg1 =
new PrioritisedListRandomGenerator<string>(
new string[] { "John", "George", "Jorge",
"Jose", "Jack", "Jimi" },1000);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + pg1.GetRandom() + "\"");
Console.WriteLine();
Console.WriteLine("random from a prioritised list with predefined scores:");
Dictionary<string,> namesAndScores = new Dictionary<string,int>();
namesAndScores.Add("Divorced", 1);
namesAndScores.Add("Single", 4);
namesAndScores.Add("Married", 5);
PrioritisedListRandomGenerator<string> pg2 =
new PrioritisedListRandomGenerator<string>(namesAndScores);
for (int i = 0; i < 5; i++)
Console.WriteLine("\"" + pg2.GetRandom() + "\"");
}
OK, here are a few points.
You may have noticed the way ranges can be exclusive or inclusive. Integer random generation by System.Random
is inclusive on the minimum and exclusive on the maximum, so is the IntegerRandomGenerator
and DoubleRandomGenerator
. I.e., creating a IntegerRandomGenerator
with a min of 20 and max of 50 will create numbers from 20 to 49. Length parameters of the StringRandomGenerator
have been designed differently for convenience: it is inclusive. Since, for populating a VARCHAR
field of a table with length of 50, it is more natural to create a StringRandomGenerator
with max length of 50, rather than 51 to include length 50.
NameRandomGenerator
is a specialised type of ListRandomGenerator<T>
which picks up names or words from its predefined list of names or words. This works sometimes better and more natural than the regular StringRandomGenerator
if you need random male or female names, surnames, or random words. It definitely feels better to test the system with proper names rather than the gobblygook generated by the StringRandomGenerator
.
EnumRandomGenerator<T>
is a utility for returning random values of an enum, where T
is an enumeration. Unfortunately, .NET generics semantics does not allow enforcing restriction of T
to an enum type in the where clause of the generic type definition. But, obviously, it will only work if T
is an enum.
ListRandomGenerator<T>
is generic random generator which can be used to pick random items from a list of any type provided. If you pass true for “unique” parameter, it returns unique values from your list. It achieves this by creating an internal copy of the list and removing the items it returns.
PrioritisedListRandomGenerator<T>
is a specialised ListRandomGenerator<T>
which can be used to return random items from your list with a predefined or random-but-constant-for-a-setting proportion, i.e., some values could be returned more often (with a priority score), hence the PrioritisedListRandomGenerator
. You can pass the relative frequencies/proportions of each item in the dictionary (they are integers, and the higher the frequency, the higher the chance of picking that item), or leave the frequencies to be setup randomly at the start by PrioritisedListRandomGenerator<T>
itself. The latter will work by passing the maximum proportion, and PrioritisedListRandomGenerator<T>
will assign random proportions to each item from 1 to maximum, and then initialise its internal list with the proportions according to the scores. PrioritisedListRandomGenerator<T>
is specially useful for populating a database field with uneven value distribution. For example, if one of your fields is “marital status”, in the real world, Single and Married values will be more likely than Widowed or Divorced, but if you populate them randomly using an EnumRandomGenerator<T>
, their proportions will be all almost the same.
Writing your own RandomGen
It is very easy to create your own random class that returns a random instance of your own type. All you need to do is to inherit from RandomGeneratorBase<T>
while defining T
as the type of your choice, and override GetRandom
and implement random generation of your type. For example, if you are creating a CustomerRandomGen
, and your customer has forename, surname, and customer number, simply keep two instance members of NameRandomGenerator
for forename and surname, and an instance of IntegerRandomGenerator
for generating random numbers, and create and return an instance of Customer
with the random values.
Usage and final word
One note on the Unit-Testing. I heavily use random generation on my Unit Tests, and that is the day-to-day usage of this framework for me, not the odd load testing and data generation. I think purists would probably believe that a test must always create consistent results. I also like to believe that, but the world I live in does not quite work like that, and I actually prefer to sprinkle some stochastic on the tests I bake so that if my code is going to fail under stochastic conditions of the production environment for not writing a test because of unforeseen conditions, this condition is triggered on my development machine, saving me the hassle and plava.