Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / security

CR-XAM

4.00/5 (1 vote)
22 Nov 2012Public Domain9 min read 17.4K   179  
A Simple But Surprisingly Effective Random Number Generator

Introduction 

Today I'd like to introduce a Random Number Generator I developed called CR-XAM.  It's a simple and efficient algorithm that generates a surprisingly high quality random output.  So far I've tested it with:

...and its passed them all with flying colors.  I've yet to test it with other suites such as: 

...so don't write home just yet.  But compared to other simple generators such as LCGs, and LSFRs, it produces much better quality, with very little decrease in speed.

Background 

So first off, what the heck does CR-XAM mean?  CR-XAM is an acronymn for the five operations that make up the Random Number Generator:  Counter, Rotate, Xor, Add, Multiply.  I liked the sound of it, and that it's a simple name for the simple operations that make up a simple RNG...  to put it simply.

Now let's dive into some code.  First I'd like to show you how to use the code, and then we'll take a look at the Algorithm itself.

Using the code

If you look at the two files above, you will see one called crxam32.zip and crxam64.zip.  The difference, as you can likely guess, is that crxam32 uses 32 bit unsigned integers in its state, and crxam64 uses 64 bit unsigned integers.  The 32 bit version has a Period around 2^48 bytes, and the 64 bit has a Period of 2^96 bytes.  In the examples below I will be using the 64 bit version, but the code in both versions is virtually identical. 

Also, despite the different bit sizes, both generators return a single byte after each call is made.  I'll explain why when I talk about the algorithm.

Seeding 

With any Random Number Generator, the first step is seeding it.  CR-XAM has 3 different options when it comes to seeding.  In all 3 cases, the seeding procedure uses rand() in stdlib.h

1)  crxam64_seed(uint32 Seed)

The first option is, of course, to allow the user to provide a Seed.  In this case, Seed, a 32 bit unsigned integer, is passed to srand().  From there, the CR-XAM uses rand() to create the initial State.

void crxam64_seed(uint32 Seed)
{
    srand(Seed);
    setup();  // The function that creates the state
} 

2)  crxam64_init(void)

The second option is to call crxam64_init().  This will create a seed using time(0).  The procedure is quite simple.

void crxam64_init(void)
{
    time_t Seed = time(0);
    Seed += (Seed >> 32);
    srand(Seed & 0xffffffff);
    setup();  
} 

3)  Just call crxam64_genrand(void)

And last, if you call crxam64_genrand() to start getting random numbers without calling crxam64_seed() or crxam64_init() first, it will go ahead and create an initial state using rand() without calling srand() first.

C++
byte crxam64_genrand(void)
{
    if(!IsSeeded)
    {
         setup();
    }
    ....
    // Rest of code will be shown later.

Generating Random Numbers 

So, now that we have our generator seeded, we need to start getting some random numbers.  Once again, the procedure is quite simple.

C++
byte B = crxam64_genrand();
 
// OR

for(size_t I = 0; I < Buffer_len; I++)
{
    Buffer[I] = crxam64_genrand();
}

Once again, both crxam64_genrand() and crxam32_genrand() return a single byte.

NOTE:  I'm well aware of the fact that most the places where Random Number Generators are used are more likely to need 32 bit integers or even doubles, rather than a single byte.  I will be releasing some code in a few days that will make it more convenient to use CR-XAM, and will make it easier to generate floating point numbers and Integers of larger size and within certain ranges (i.e. random number betwee Min and Max).  In the meantime I apologize for any inconviences.

So those are the function calls that you can allow you to use the generator.  Now let's take a more indepth look at what's happening behind the scenes.

State 

The State of an RNG are all the variables and bits that are used to generate the random numbers.  I would argue that, specifically, they are all the variables that are able to be changed by the seeding procedure.

C++
  // Generator State
static uint64 Xc;
static uint64 Ac;
static uint64 Mc;
 
static byte Xr;
static byte Ar;
static byte Mr;
 
static uint64 Accum;
static int IsSeeded = 0; // Flag 

Before I explain the different variables, I would like to go ahead and point out that the datatypes uint64 in crxam64 and uint32 in crxam32, as well as byte and typedefs declared in another file "stdtypes.h".

C++
typedef unsigned char byte;
typedef unsigned long int uint32;
typedef unsigned long long int uint64;

<code>I'm sure most of you figured that out, but I didn't want anyone to get confused. And yes, I am aware that pretty much the same thing is in <stdint.h>

C++
// Found in stdint.h>
uint32_t;
uint64_t;

Personally, that little tail "_t" just irritates me.  Call me crazy, but I see it as just needless clutter in my code.  I put enough clutter there myself, thank you very much.  I don't need some standard library enforcing it's own clutter on me. 

Setup - The Seeding Procedure

Earlier in the code I showed you the different functions you can use to Seed the Generator.  All 3 of those options make use of the same function, setup().

C++
static void setup(void)
{
    Accum = 0;
    Xc = Ac = Mc = 0;
 
    for(int I = 0; I < 8; I++)
    {
         Accum = (Accum << 8) | (rand() & 0xff);
	 Xc = (Xc << 8) | (rand() & 0xff);
	 Ac = (Ac << 8) | (rand() & 0xff);
	 Mc = (Mc << 8) | (rand() & 0xff);
    }
 
    Xr = rand() & 0xff;
    Ar = rand() & 0xff;
    Mr = rand() & 0xff;
 
    IsSeeded = 0xffff;
}

Many of you are probably wondering why I only use rand() to seed the generator.  Why did I only let the user pass in a 32 bit seed rather than an entire array of bytes?  Why?  Why?  WHY???

I have 3 reasons: 

  1. It makes the seeding procedure simpler.  Instead of having to worry about getting an entire array of random bytes, you only have to worry about a 32 bit integer.
  2. Using a lower quality generator to seed a high quality one is a common practice.  There are several implementations of Mersenne Twister, the Sir Lancelot of RNG's if you will, that use a Lagged Fibonacci Generator to Seed Mersenne Twister.
  3. I wanted CR-XAM to be used in places where people previously used rand() in stdlib.h.  This makes it easier to drop in and replace srand(Seed) with crxam64_seed(Seed).  (Yes, I'm aware that rand() returns a short.  I'm working on it!!!). 

crxam64_genrand()

And last but not least, the Little Black Box of this Heart of Gold. 

C++
byte crxam64_genrand(void)
{
    if(!IsSeeded)
    {
        setup();
    }
 
    Accum = rotl(State, ++Xr);
    Accum ^= ++Xc;
 
    Accum = rotr(State, ++Ar);
    Accum += ++Ac;
 
    Accum = rotl(State, ++Mr);
    Accum *= ++Mc;
 
    return (Accum >> 56) & 0xff;
}

Simple, no?

The Algorithm

Well, now that we've taken a look at all the code, let's discuss how this baby ticks.

First, the functions rotr() and rotl() are bit rotations.  These are very commonly used functions, so I won't go over them here in too much depth here.  If you've never heard of them, they shift the bits, but instead of losing bits on one end and gaining 0's on the others, the bits that would fall off are brought to the other side.

Once again the Algorithm is very simple.  The Pseudo-Code for its is pretty straightforward. 

  1. Increment the variables Xr, Xc, Ar, Ac, Mr, and Mc by 1.  Don't worry about overflow.  
  2. Rot<code><code><code>ate Accum to the left Xr bits, then XOR with Xc.  Store the Result back in Accum.
  3. Rotate Accum to the right Ar bits, then Add Ar to Accum.  Store the Result back in Accum.
  4. Rotate Accum to the left Mr bits, then Multiply with Mc.  Store the Result back in Accum.
  5. Return the top 8 bits (1 byte) of Accum as the output. 

One of my favorite aspects of the Algorithm is that you can adapt it to any size integer you want (although in that case you might have to worry about overflow).    

Now let's talk about the State.  I tried to give the variables simple, but obvious names.  The first letter, X, A, and M all describe which operation their involved in. 

  • Xr and Xc are involved in the XOR stage,
  • Ar and Ac are used in the Addition Stage.
  • Mr and Mc are used in the Multiplication Stage.  

Also, the last letter, r and c, mean, respectively, Rotation and Counter (technically all the variables are counters, but I liked this naming convention, so I decided to stick with it).

Accum is the variable that all the operations are carried out on.  It's where all the randomness accumulates, and is what the output is dervied from.

The Operations

The Counter, Rotations, Addition and XOR are commonly used to build Cryptographic Primitives.  Multiplication is a bit of an exception to this.  The late George Marsaglia, creator of the Diehard Tests and a well respected authority on Random Number Generators, claimed that in his experience, multiplication did a better job of mixing bits than XOR or Addition (I apologize, I can't remember where I read this.  When I find the source, I'll post it).

The Output

Unlike many other simple RNG's, instead of returning all 64 bits of Accum as the output, I only return 8 bits.  Some people see this as being inefficient.  That I could be getting eight times as much output that I currently am.  However I have two reasons that I believe support my choice.

  1. Because of the Multiplication Operation, the top 8 bits will be the ones that are most thoroughly mixed.  By only using them, we're ensuring that we're getting the best output we can.
  2. A common weakness that causes most simple RNG's to fail advanced statistical tests is that many of them cannot Generate output with little statistical randomness.  At some point, a True Random Number generator will generate a long sequence of all 1's or 0's.  The reason most simple RNG's fail is that when they generate the out put, they use it as both the utput, and as the seed for the next output.   However if there Seed has little statistical randomness (i.e. it is 111111111 or 00000000), their generator won't be able to generate any useful output.  By only using the top 8 bits, we're able to ensure that we can generate long sequences of 1's and 0's, regardless of how random the actual state is.
     

Statistical Tests 

The file Stat_Tests.zip contains the results of the Statistical Tests I mentiond at the beginning of the Article.  One additional test I performed that I did not mention was using Ent, program that performs some basic Statistical Tests.   

Entropy = 7.999998 bits per byte.

Optimum compression would reduce the size
of this 126000000 byte file by 0 percent.

Chi square distribution for 126000000 samples is 276.34, and randomly
would exceed this value 17.13 percent of the times.

Arithmetic mean value of data bytes is 127.5020 (127.5 = random).
Monte Carlo value for Pi is 3.141631810 (error 0.00 percent).
Serial correlation coefficient is -0.000067 (totally uncorrelated = 0.0).

Conclusion  

So far I've been very pleased with CR-XAM.  It's yet to fail any tests I've thrown at it, and I certainly hope it continues to do so.  If you find any problems or have any questions, please feel free to bring them to my attention.

I hope you all find this useful. Happy Thanksgiving!

Jacob W.

History 

Nov. 22, 2012 - Original Posting
                       - Added Results for Statistical Testing.

Algorithm License 

Because this article covers both the source code I wrote and the algorithm I came up with, I felt I should cover all my bases. 

All source code presented here and included in the files is in the Public Domain, and may be used however you wish.  

I hereby place this algorithm into the Public Domain, and relicense all royalty rights and copyrights.

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication