Random numbers should not be generated
with a method chosen at random
- Donald Knuth
The Art of Computer Programming
Volume II: Seminumerical Algorithms
Introduction
Random numbers are a primitive element not only to cryptographers, but
Computer Scientist and Programmers in general. What is desired is a method
which produces a pseudo random stream of numbers fast which are
cryptographically secure. However, the goals are diametrically opposed -
pseudo random sequences can be produced quickly, or can be produced strongly;
but usually not quickly with properties which resist Cryptanalysis.
This article will examine the applications of Pseudo Random Number
Generators. Topics covered will include:
- Usefulness of Random Numbers
- Random Number Generator Classes
- Types of Generators
- Random Number Sources
- Testing Sequences
- Seeding
- Key Derivation Function
- Standard C++ rand( )
- Linear Congruential Generator
- Minimal Standard Generator
- Non Linear Congruential Generators
- Win32 API CryptGenRandom( )
- Compiling and Integrating Crypto++
- Crypto++ Random Number Generators
- LC_RNG
- RandomPool
- AutoSeededRandomPool
- AutoSeeded917RNG
- Application Table
- Additional CSPRNGs
- Poor Generators
- Middle Square Method
- IBM Mainframe RANDU
Downloads
There are seven downloads available with this article. The downloads are
presented at the end of the article.
Usefulness of Random Numbers
In his prolific work <a
href="http: en.wikipedia.org="" wiki="" the_art_of_computer_programming"="">The Art
of Computer Programming, Volume II: Seminumerical Algorithms, Donald
Knuth identifies the following common situations in which one may require
random numbers:
- Simulation
- Sampling
- Numerical Analysis
- Computer Programming
- Decision Making
- Aesthetics
- Recreation
Simulation and sampling are self explanatory. Numerical Analysis uses
random numbers to solve complicated problems. For example, a Predator/Prey
model using differential equations. Four is equally obvious. Taking from
Knuth on item five, Decision Making:
There are reports that many executives make their decisions by flipping
a coin or by throwing darts, etc. It is also rumored that some college
professors prepare their grades on such a basis. Sometimes it is important
to make a completely "unbiased" decision. Randomness is also an essential
part of optimal strategies in the theory of matrix games.
The human ear can distinguish between synthesized music (generated from a
computer) versus that of an artist. The artist will have small deviations in
rhythm which makes the music more pleasing. Additionally, a small bit of
randomness makes computer generated graphics appear softer. Note the rigid
structure below of the left image versus the softness of the right image.
Below the right image was produced with a filter which added a small random
amount noise. In Signal Processing, this is known as <a
href="http: en.wikipedia.org="" wiki="" anti-aliasing"="">Anti-Aliasing.
| |
|
No Random Noise
| | Random Noise
|
To further impress aesthetics upon the reader, Adobe Photoshop employs
random numbers in many of its filters, including Blur, Distort, and Noise.
|
Adobe Photoshop Filters
|
Finally, recreation would include shuffling cards or rolling dice.
Random Number Generator Classes
Taking frm NIST, FIPS 140-2:
Random Number Generators fall into one of two classes: deterministic and
nondeterministic. A deterministic RNG consists of an algorithm that
produces a sequence of bits from an initial value called a seed. A
nondeterministic RNG produces output that is dependent on some
unpredictable physical source that is outside human control.
Note that layman generally refer to the nondeterministic generator as a
'true random number generator'.
Types of Random Number Generators
There are two types of deterministic random number generartors from which
a programmer may choose:
- Pseudo Random Number Generators
- Cryptographically Secure Pseudo Random Number Generators
Pseudo Random Number Generators would include generators such as Linear
Congruential Generators and Mersenne Twisters. They are generally good at
quickly providing a uniformly distributed stream over the interval [0, 1).
They offer little to no cryptographic security.
Cryptographically Secure Pseudo Random Number Generators have additional
properties which make them suitable for use in Cryptography. Cryptographic
uses would include:
- Key Generation
- Nonces
- Salts
- One Time Pads
A Nonce is a number or bit string used only once. It is a pseudo random
number issued in an authentication protocol to ensure that old communications
cannot be reused in replay attacks. To ensure that a nonce is used only once,
it should be time-variant (including a suitably granular timestamp in its
value), or generated with enough random bits to ensure a probabilistically
insignificant chance of repeating a previously generated value.
The One Time Pad (invented circa 1917) is an encryption algorithm where
the plaintext is combined with a random key or "pad" that is as long as the
plaintext and used only once. A modular addition is used to combine the
plaintext with the pad. For binary data, the operation XOR amounts is the
equivalent. If the key is truly random, never reused, and kept secret, the
one time pad provides perfect secrecy.
Random Number Sources
Though NIST does not currently recognize any nondeterministic RNGs, one
may use a deterministic RNG to seed a cryptographically secure pseudo random
number generator. Taking from FIPS 140-2:
Until such time as an Approved nondeterministic RNG standard exists,
nondeterministic RNGs approved for use in classified applications may be
used for key generation or to seed Approved deterministic RNGs used in key
generation
NIST provides tests which allows one to develop heuristics for determining
the of the quality of the sequence from the generator inquestion. Included
for download are NIST Validation Suite, FIPS 140-2, and and FIPS 186.
Additionally, the reader may want to examine ANSI 9.17, Appendix C (Approved
Random Number Generators for FIPS 140-2, Security Requirements for
Cryptographic Modules).
Also of interset is NIST SP800-90, <a
href="http: csrc.nist.gov="" publications="" nistpubs="" 800-90="" sp800-90revised_march2007.pdf"
target="_blank">Recommendation for Random Number Generation Using
Deterministic Random Bit Generators. Some conservative cryptographers do
not recommend using Dual_EC-DRBG (the elliptic curve variant) but does
recommend using CTR_DRBG or Hash_DRBG. Schneier's article can be found at <a
href="http: www.wired.com="" politics="" security="" commentary="" securitymatters="" 2007="" 11="" securitymatters_1115"="">Did
NSA Put a Secret Backdoor in New Encryption Standard?
Background on Tests
In a random sequence, on would expect each of the ten decimal digits to
occur approximately 1/10 times. Should the radix be 2 (digits of either 0 or
1), each digit should represent approximately 50% of the sequence.
Testing involves differentiating good sources from poor choices. For
example, the binary stream ...1111111110000000000... would perform ideally
with a simple Frequency test, but fail at advanced tests such as Runs,
Longest Runs in a Block, and Cumulative Sums. A counter intuitive point with
the presented binary stream is that it is a valid sequence of random numbers.
Each stream is as equally likely to occur as any other in an unbiased
generator.
Testing Sequences
Below the reader will find various tests which should be used when
evaluating the effectiveness of a generator. The reader should refer to
NIST's Guide to the Statistical
Tests and Knuth's The <a
href="http: en.wikipedia.org="" wiki="" the_art_of_computer_programming"="">Art of
Computer Programming for a complete description.
In addition, NIST's SP800-22, A Statistical Test Suite for Random and Pseudorandom Number
Generators for Cryptographic Applications should be consulted. NIST's
site includes software for testing the generator.
NIST requires a PRNG pass 16 statistical tests. The tests are listed below
with a brief description.
- Frequency - proportion of zeroes and ones for the entire sequence.
- Frequency within a Block - determine whether the frequency of ones is
an M-bit block is approximately M/2.
- Runs - the total number of zero and one runs in the entire sequence,
where a run is an uninterrupted sequence of identical bits.
- Longest Run of Ones in a Block - determine whether the length of the
longest run of ones within the tested sequence is consistent with the
length of the longest run of ones that would be expected in a random
sequence.
- Random Binary Matrix Rank - check for linear dependence among fixed
length substrings of the original sequence.
- Discrete Fourier Transform (Spectral) - detect periodic features.
- Non-overlapping (Aperiodic) Template Matching - number of occurrences
of predefined target substrings.
- Overlapping (Periodic) Template Matching - number of occurrences of
predefined target substrings.
- Maurer's Universal Statistical Test - number of bits between matching
patterns. The purpose of the test is to detect whether or not the
sequence can be significantly compressed without loss of information.
- Lempel-Ziv Complexity - how far the tested sequence can be compressed.
The sequence is considered to be non-random if it can be significantly
compressed.
- Linear Complexity - length of a generating feedback register.
- Serial - frequency of each and every overlapping m-bit pattern across
the entire sequence.
- Approximate Entropy - frequency of each and every overlapping m-bit
pattern.
- Cumulative Sum - determine whether the cumulative sum of the partial
sequences occurring in the tested sequence is too large or too small
relative to the expected behavior of that cumulative sum for random
sequences.
- Random Excursions - determine if the number of visits to a state within
a random walk exceeds what one would expect for a random sequence.
- Random Excursions Variant - detect deviations from the expected number
of occurrences of various states in the random walk.
Knuth sates a random sequence should pass 13 statistical tests. Tests
which are not included in NIST's requirements are listed below.
- Chi-Square - Classic Definition
- Kolmogorov-Smirnov - Extends the Chi-Square test to the set of Real
Numbers
- Gap - detect gaps between a number over a range of numbers in a
sequence
- Poker (Partition) - n groups of five successive integers from the
stream, observing the resulting pattern. One pair is aabcd, full house is
aaabb, etc.
- Coupon Collector's - Examines the length of the sequence required to
observe all numbers in the set 0 to d-1.
- Permutation - number of orderings of grouping the sequence into
partitions
- Collision - Used when the Chi-Squared Test exceeds a certain number of
collisions
- Birthday Spacing - Similar to the Birthday Paradox when selecting two
integers in the sequence
Seeding
<a
href="http: en.wikipedia.org="" wiki="" cryptographically_secure_pseudorandom_number_generator"="">Cryptographically
Secure RNGs share an Achilles' heel with the other pseudo random number
generators - both require an starting seed as an input. Should the secret
seed become compromised, an adversary can generate the entire output sequence
instantly by using the same algorithm.
In 1995 David Wagner, then a graduate student at Berkeley, and fellow
student Ian Goldberg cracked the random number generator used by the Netscape
web browser to secure online transactions. The pair determined that Netscape
was constructing its seeds using easily predicted numbers, such as the time
of day.
Key Derivation Functions
Key derivation functions are used (among others) to derive keys from
secret passwords or passphrases. For guidance one can use <a
href="http: www.faqs.org="" rfcs="" rfc2898.html"="" target="_blank">RFC 2898,
Password-Based Cryptography Specification. In addition, the .NET platform
provides <a
href="http: msdn2.microsoft.com="" en-us="" library="" system.security.cryptography.rfc2898derivebytes.aspx"
target="_blank">Rfc2898DeriveBytes, which is part of <a
href="http: msdn2.microsoft.com="" en-us="" library="" system.security.cryptography.aspx"
target="_blank">System.Security.Cryptography.
Key derivation functions are often used in conjunction with non-secret
parameters to derive a keys from a common secret value. A KDF may also be
used to ensure that derived keys have other desirable properties, such as
avoiding weak keys in some specific encryption systems.
Key derivation functions are also used in applications to derive keys from
secret passwords or passphrases, which typically do not have the desired
properties to be used directly as cryptographic keys. In such applications,
it is generally recommended that the key derivation function be made
deliberately slow so as to frustrate brute force attack or dictionary attack
on the password or passphrase input value.
Such use may be expressed as Dk = KDF(Key, Salt, Iterations)
where Dk is the derived key, KDF is the key derivation function,
Key is the original key or password, Salt is a random number which acts as
cryptographic salt, and Iterations refers to the number of iterations of a
sub-function. The derived key is used instead of the original key or password
as the key to the system. The values of the salt and the number of iterations
(if it isn't fixed) are stored with the hashed password or sent as plaintext
with an encrypted message.
Standard C++ rand( ) Function
The Standard C++ rand() function uses a linear congruential generator. It
offers a uniformly distributed bit stream quickly when the parameters
m, a, c, and X0 are
appropriately chosen. The LCG offers no Cryptographic Security.
X0 is colloquially referred to as the seed, often
using the system time. The generator obtains numbers by using the following
recurrence relation:
Xn+1 = (aXn +
c) mod m
The values chosen for the generator in Microsoft's Visual C++ environment
are shown below. Note that the & 0x7FFF
is performing the
modular reduction.
return (((holdrand = holdrand * 214013L + 2531011L) >> 16) & 0x7fff);
Joan Boyar's PhD dissertation, <a
href="http: www.springerlink.com="" content="" p4526x2j040m7j12="" "="">Inferring
Sequences Produced by a Linear Congruential Generator Missing low-order
Bits is an interesting read. The astute reader should realize there
probably exists an online gaming site using an insecure system. Note that
this is a different attack than Wonging (named after Stanford Wong,
a former IBM researcher and professional gambler), which is basically a card
counting system.
Minimal Standard Generator
First proposed by Miller, Lewis, and Goldman in 1969, this generator is is
a linear congruential generator with c=0. The refinement of dropping the
constant resulted in a Multiplcative Generator:
Xn+1 = aXn mod
m
Park and Miller suggest the values of a = 75 = 16807,
m = 231-1 = 2147483647. 231-1 produces a
period of 231-2. Other valuse of a exist when using
231-1: 48271 and 69627. No other values should be used.
Non Linear Congruential Generator
There are many fast random number generators available for use as a
replacement to the standard C/C++ library's rand() and srand() funtions. The
reader should familiarize themselves with <a
href="http: portal.acm.org="" citation.cfm?id="63042"">Random Number Generators:
Good Ones Are Hard To Find. Steve Park provides the source code to the
Lehmer generator at his <a
href="http: www.cs.wm.edu="" ~va="" software="" park="" park.html"="">Random Number
Generator webpage. Park's code provides additional PRNGs based on the
following distributions (these generators are referred to as Non Linear
Congruential Generators):
- Bernoulli
- Binomial
- Equilikely
- Geometric
- Pascal
- Poisson
A Mersenne Twister, developed in 1997 by Makoto Matsumoto
and Takuji Nishimura would also fall under this category. An interesting
factiod in the world of computer viruses is that <a
href="http: www.symantec.com="" enterprise="" security_response="" weblog="" 2007="" 03="" the_dread_pirate_roberts.html"
target="_blank">30 viruses use the generator. It is speculated one author
is responsible for these viruses.
Win32 API CryptGenRandom( )
CryptGenRandom()
generates a requested number of bytes for
the user. GenCryptRandom()
is an excellent source of entropy on
Windows since it is available for all Operating Systems except NT 3.51 and
earlier, and Windows 95 (SR1) and earlier.
The latest seed value for CryptGenRandom
is stored in the
Windows Registry under the key \SOFTWARE\Microsoft\Cryptography\RNG\Seed of
the HKEY_LOCAL_MACHINE hive. The seed is developed by the Operating Systems
using various system parameters. For example, on a Windows CE device, the
following will be used:
- Thread and kernel switches (
CeGetRandomSeed
) - The current process identifier (
GetCurrentProcessId
) - The current thread identifier (
GetCurrentThreadId
) - Ticks since boot (
GetTickCount
) - Current time (
GetLocalTime
) - Memory information (
GlobalMemoryStatus
) - Object store statistics (
GetStoreInformation
)
After the seed is developed, it undergoes two cryptographic transforms: an
MD4 hash and a RC4 encryption for additional mixing and chopping.
Sample 1 demonstrates using the Windows API to acquire pseudo random
values. To remove the burden of an SDK, the program uses
LoadLibrary()
and GetProcAddress()
.
Compiling and Integrating Crypto++
Please see the
related article, Compiling and Integrating
Crypto++ into the Microsoft Visual C++ Environment. This article is based
upon basic assumptions presented in the previously mentioned article. It also
addresses most problems encountered with projects from Command Line to MFC
(Errors C1083, C1189, LNK1104, LNK2001, and LNK2005). Additionally, it
provides some tips and other niceties for using the Crypto++ Library.
LC_RNG
LC_RNG
is a Linear Congruential RNG. Though this generator
has no cryptographic value, it does allow one to reproduce results when
debugging a program. Additionally, it is generally faster at generating a
byte block (or stream). If one seeds the LCG with 0x00, a steady stream of
0x80 is the result. This seed value produces a period of 1. Other seeds
perform as desired. Sample 1 may be used to compare the results of various
seed values of Linear Congruential Generators.
RandomPool
The RandomPool
behaves similar to an LCG in that the same
seed produces the same results. However, unlike LC_RNG
, the
cipher behind the RandomPool
is currently
MD5<SHA>
. randpoool.cpp provides
typedef
MDC<SHA> RandomPoolCipher
. Then
RandomPool
would be
initialized and used as follows (as demonstrated in Sample 3):
const unsigned int SEEDSIZE = 16;
byte pcbSeed[ SEEDSIZE ];
const unsigned int BLOCKSIZE = 16 * 8;
byte pcbScratch[ BLOCKSIZE ];
...
CryptoPP::RandomPool rng( SEEDSIZE );
rng.Put( pcbSeed, SEEDSIZE );
rng.GenerateBlock( pcbScratch, BLOCKSIZE );
AutoSeededRandomPool
An auto seeded random pool was suggested by Leonard Janke, which Wei Dai
later incorporated into Crypto++. AutoSeededRandomPool
uses the
RandPool
design from PGP. It is suitable for all cryptographic
purposes including generating keys and IVs. Depending on the Operating
System, the generator will be seeded using the following:
CryptGenRandom()
by way of a Cryptographic Service
Provider /dev/urand
/dev/rand
Sample 4 trivially demonstrates using an AutoSeedeRandomPool.
const unsigned int BLOCKSIZE = 16 * 8;
byte pcbScratch[ BLOCKSIZE ];
AutoSeededRandomPool rng;
rng.GenerateBlock( pcbScratch, BLOCKSIZE );
AutoSeededX917RNG
Unlike LG_RNG
and RandomPool
,
AutoSeededX917RNG
does not require seeding. However, one must
specify an approved Block Cipher as a template parameter. The two ANSI 9.17
approved ciphers are 3-key Triple DES and SHA1. Sample 5 demonstrates use
with SHA1.
const unsigned int BLOCKSIZE = 16 * 8;
byte pcbScratch[ BLOCKSIZE ];
AutoSeededX917RNG< DES_EDE3 > rng;
rng.GenerateBlock( pcbScratch, BLOCKSIZE );
Application Table
The following table should be applied when chosing a Random Number
Generator in practice.
Additional CSPRNGs
In addition to the previously mentioned, various FIPS standards recognizes
other cryptographically secure generators. As the reader should now realize,
a Cryptographically Secure Pseudo Random Number Generator wraps a
deterministic generator in a difficult problem. FIPS 186-3 approves the
Digital Signature Algorithm (DSA) and Elliptic Curve DSA (ECDSA) as
CSPRNGs.
Poor Generators
This section is included for pandectic resaons. I hope the reader will
appreciate (and sometimes enjoy) the shortcomings.
Middle Square Method
The Middle Square Method of generation was proposed by John
von Nuemann. The generator was used from 1946 to the mid 1950s. Keep in mind
that in 1946, the computer was 1 year old. The generator suffered from the
fact it converged to a sequence of 0s too quickly; and once at 0, it was
latched at 0.
IBM Mainframe RANDU
RANDU was a congruential generator supplied by IBM for use on it's
mainfame computers. The choice of a=65539 and m=231 proved to be a
poor choice. When the authors of Numerical Recipies in C generated a
sample and plotted the planes, only 11 planes existed (where there should
have been on the order of m1/k, where k
is the number of random numbers chosen plotted in three dimensional space.
The generator suffered from Correlation in k-space. The IBM
representative stated:
<blockquore>
We guarantee that each number is random individually, but we don't
guarantee that more than one of them is random.
Subtract with Borrow
In 1992, physicists discovered that even "high-quality" random-number
generators, can yield incorrect results under certain circumstances. In
preparation for simulating the three-dimensional Ising model, researchers
tested the package on the two-dimensional version, which has a known answer.
The result was incorrect. [1]
Summary
This article presented the reader with with various choices for a pseudo
random number generator based on the problem domain. Should the reader
require a source for Simulation or Sampling, choose an LCG. If the reader
requires strength, choose the Win32 API, AutoSeededRandomPool, or
AutoSeededX917RNG. If the reader requires Cryptographic Security, choose an
AutoSeededX917
. And finally, if the reader requires entropy, use
Win32 API or AutoSeededRandomPool
.
In addition, other generators exist (some of which are exposed in
Crypto++). For example, the Blum Blum Shub CSPRNG produces a bit sequence
based on Quadratic Residues. The generator will remain cryptographically
secure as long as Integer Factorization remains hard. It has yet to be
determined where Integer Factorization lies in Complexity Theory.
Acknowledgements
- Wei Dai for Crypto++ and his invaluable help on the Crypto++ mailing
list
- Dr. A. Brooke Stephens who laid my Cryptographic foundations
Revisions
- 04.10.2008 Added Reference to Schneier's Wired.com Article
- 11.28.2007 Added 'Poor Generator' Section
- 11.07.2007 Added Reference to NIST Test Suite
- 09.13.2007 Added Explaination of Nonce
- 09.05.2007 Added Reference to NIST 800-90
- 09.05.2007 Added Reference to NIST 800-22
- 08.28.2007 Added Reference to Mersenne Twister
- 08.27.2007 Added Reference to RFC 2898
- 01.12.2007 Added Non-Linear Congruential Generators
- 01.09.2007 Expanded Usefulness of Random Numbers
- 01.07.2007 Added Reference to Leonard Janke
- 01.05.2007 Added Application Table
- 01.04.2007 Added Key Derivation Function
- 01.03.2007 Initial Release
Downloads
Checksums
- ASRP.zip
MD5: 773B2C409A7140CBA16EFF7903AA2C11
SHA-1: 9CD4BC2C3E156157AA10ADF62994D4065980EC80 - AutoSeededX917.zip
MD5: B855380703A24DF9671AE8C7146113A9
SHA-1: 1B724CD0C28CC4909F7A14F374EE1874CFB49E42 - FIPS140-2.zip
MD5: 7E106DBF757F281C764AD6F09DFEF4DA
SHA-1: 0ACD845C20B4F8FA5EA58C3C04270A8DFB6DCAC7 - GenRandom.zip
MD5: B99A0CCC8BA862350235EBEBDFC3A0D2
SHA-1: 33EA241D0E86E8AB6D86F64FB62C7C67678163CC - LCG.zip
MD5: 0F4A91276B2CE558DB8C8F65218AA32D
SHA-1: BB598A1A9FC13B519BD5D5CDB37FA52B39E1230C - RNGVS.zip
MD5: 6E608C5AFFFE4D906E435C39BC2CC687
SHA-1: ED5E73E39E3ADBEDA008A175E4243D8A59ABD254 - RandomPool.zip
MD5: 345CF4D59BED53B501ED1D58EF0FD892
SHA-1: E699F73A53A57178E150A1A020DE4B173F4FF1E9
References
[1] The Bias of Random-Number Generators, <a
href="http: www.sciencenews.org="" articles="" 20030927="" mathtrek.as"="">http://www.sciencenews.org/articles/20030927/mathtrek.as,
accessed November 2007.