Introduction
Pseudo Random Number Generators (PRNG) are used
in many cryptographic operations. One of the most important cryptographic
operations is generating encryption keys. If an attacker can compromise your
(pseudo) random number generator, they can potentially also compromise your
encryption key. This is bad; if an attacker knows the key used to encrypt your
data, they can decrypt what you have encrypted. Therefore a good PRNG must
generate 'random' numbers difficult to predict and the the state of the PRNG
itself must be difficult to compromise. Although Fortuna is new (and
cryptographers never trust anything new), it has been designed with both of
these goals in mind. Fortuna was designed by two well known people in the
cryptography / security field with experience in this area.
Fortuna is a random number generator developed by Bruce Schneier and Niels Ferguson in their book Practical
Cryptography. Fortuna addresses some of the shortcomings of their previous
PRNG Yarrow. The biggest
challenge implementing Yarrow is that Yarrow requires entropy estimates. Fortuna
overcomes this issue by removing the entropy esimators. The biggest design
change besides the removal of the entropy esimators is that Fortuna has 32 pools
to collect entropy.
My implementation of Fortuna is composed of 4 parts. There is a generator
that generates the actual pseudo random data. There is an accumulator composed
of 32 pools of source data (entropy). There is a source manager that manages the
data sources. And there is a seed file that maintains a copy of the state of the
pools and ensures that Fortuna can generate pseudo random data even after the
computer has just rebooted. Fortuna avoids the problem of requiring an accurate
estimate of the entropy by using 32 entropy pools. The entropy pools are used to
periodically reseed the generator. Therefore if one pool or stream of pseudo
random numbers is compromised, at the next reseed the generator will be using a
new seed so that future PRN's are not also compromised.
The authors give a lot of implementation details for Fortuna. For the most
part I followed their suggestions, but I did make a few changes. For more
complete info goto www.citadelsoftware.ca/fortuna.
For one thing, the authors recommend not using C or C++ for writing secure
software, and this is a C++ component. I'll try to point out along the way where
I deviate from their recommendations.
Design Goals
The main design goal of this implementation of Fortuna is to develop a PRNG
which would be suitable for use on a web server. One distinguishing feature of a
web server is that typically there is not much input from the keyboard or mouse.
Therefore, this implementation of Fortuna uses the available information from
Windows as 'entropy' sources to the generator. Note that this type of generator
is more correctly called 'chaotic deterministic' than 'pseudo random' because
the sources are not truly random. Some of the entropy for this implementation of
Fortuna comes from using the high performance timer to time operating system
calls. I realize that timing data is not true entropy because it is
deterministic. I'll say more below on the high performance counter. And being
unable to predict the outcome of the PRNG is main property desired.
Another design goal of this implementation of Fortuna is to make the PRNG
difficult to successfully attack. This is done by
- There is a large volume of data fed to the entropy pools, some with high
entropy and some with low entropy. An attacker will have difficulty compromising
the state of a single pool due to the high volume of data.
- Data from the Windows registry is used to so that application data specific
to your environment is included as a (low) entropy source to the accumulator
- The status information of each process (page faults, memory, I/O Reads and
Writes) are used as entropy sources
- Extensive use of the high precision timer is used as a source of entropy
- Avoiding simple contiguous data storage for key random data used in the prng
to avoid memory scanning attacks
- Being careful to randomize valuable random data in memory when the valuable
data is no longer needed
Note that a true 'entropy' source requires a physical source; for a hardware
RNG see ComScire (I am not affiliated
with this company, but they appear to be a respected well established company
with a good product from all I have read).
If you:
- trust your network security
- are confident an attacker cannot get access to the complete registry, and
monitor the status of all the processes running on your machine
- are confident that an attacker cannot run his own software on your machine
to attempt to determine the contents of the 32 data pools and compromise the
generator
I believe that this PRNG may suit your needs quite well.
In order to compromise this implementation of Fortuna, an attacker would
require access to the entire registry, process monitoring etc. An attacker would
have to severely compromise a machine to do this. Because of this, I believe
this implementation of Fortuna would be acceptable for many web based
applications. Below I list all of the data souces used to populate the data
pools. Note that I said 'data pools', not 'entropy pools'. This is because some
of the sources used don't have much entropy. For example, I use the entire
registry as a source for the pools. Now, there is not a lot of entropy in the
registry, but for an attacker to know the entire registry would require that the
entire machine has been compromised. If this happens, you have a lot of problems
besides the possibility of your PRNG being compromised.
Class Design
This implementation of Fortuna is highly multithreaded. Each data source runs
on its own thread and each data pool also runs on its own thread. Each data
source cycles over all the pools, distributing one byte to each pool in a round
robin fashion. Each pool object has a mutex protecting the pool data. Because
each source spreads it's data over each of the pools, this adds another level of
'deterministic chaos' to the distribution of source data over the pools. When
two different sources access a pool at the same time, the thread scheduler
determines which thread will have access first. While the thread scheduler is
not random, given that there are 32 pools each on their own thread, and over 32
sources each on their own thread, predicting which source byte ends up in which
pool quickly becomes intractible.
All of these threads do not add much overhead to your application because
they spend most of their time waiting. Each source calls 'WaitForSingleObject'
to determine if the component is shutting down.
UML diagrams for my implementation of Fortuna can be found on my web site at
here.
Entropy Sources
Entropy sources (using the word entropy loosely as described above, they are
more appropriately described as data sources) are distributed among the entropy
pools. In the book Practical Cryptography, each source event is added to
only one pool(see 10.5.6 in the book). In this implementation, each byte from
each source event is distributed among the pools in a round robin fashion. With
each byte of data there is also a byte which specifies the number of bits of
entropy in that source byte of data. This allows the pools to keep track of how
many bits of entropy are in the pool.
Each source must provide this virtual method to provide some 'chaotic data'
(I know, I know, it's a cheesy name)
virtual bool GetChaoticData(std::vector<std::pair<unsigned char,
int> > vData)=0;
The High Resolution Performance Counter
Most new computers now come with a high resolution performance counter. The
frequency of the counter is available from the api function:
BOOL QueryPerformanceFrequency(LARGE_INTEGER* lpFrequency);
If there is no high performance timer, the return value is 0. On my 2.2GHZ
laptop, the frequency of the high performance timer is 3,579,545. Note that the
frequency is different from the CPU frequency. The value of the high resolution
performance counter is obtained by:
BOOL QueryPerformanceCounter(LARGE_INTEGER* lpPerformanceCount);
When timing data is obtained using the high performance counter, each byte
that is different in the timer adds one bit of entropy to the pool.
I did some experiments regarding the high performance counter to determine
how many bits of entropy I could claim by using the high performance counter to
time the counts for a Sleep of 1ms. I wrote a separate program for these tests.
I used the high performance timer to get the high performance count around the
Sleep
fn as follows:
hpTimer.Start();
Sleep(dwSleep);
hpTimer.Stop();
diff = hpTimer.m_stopTime.LowPart - hpTimer.m_startTime.LowPart;
diff -= dwFreq/1000*dwSleep;
where hpTimer
is a simple class to wrap the high performance
timer. Then I add the lsb from the difference in time to a bitvector (called
BitBucket) I wrote myself (member m_bitbucket
of type BitBucket). I
need the mutex because I used 200 threads to speed up the bit generation and had
a separate thread which extracted the accumulated lsb's and wrote them out to a
file. The file writer thread Sleeps 100ms after extracting the bits from each
bit generation thread.
uc = (unsigned char)(diff & 0x01);
m_mutex.Wait(INFINITE);
m_bitbucket.Add(uc);
m_mutex.Release();
I accumulated all of the lsb's, writing 60MB or so to a file and then ran the
DieHard randomness tests on the bits.
The data passed most of the diehard tests. If you're interested in more of the
gory details you can find them at my web site www.citadelsoftware.ca/hptimer.
The Windows Registry
The Windows registry is a huge data repository. This is one source of data
used to populate the data pools. Again, this is not random data, but reproducing
this data requires the attacker to have access to the entire registry. One of
the challenges of working with the registry for this sort of application is
ensuring that every handle to a registry key that is opened also gets closed.
For this PRNG to be able to run on a server for weeks or months without
rebooting, there can be no resource leaks. I have taken care to close registry
key handles using RegCloseKey()
when the handle is no longer
needed. Also, registry keys are opened with KEY_ENUMERATE_SUB_KEYS |
KEY_QUERY_VALUE
so that only read permission is requested.
None of the data in the registry is considered to add entropy to the pools
except for the CLSID's. Again this is a very conservative approach. Using the
registry with four different source threads spreading the data over the pools
complicates an attackers job considerably.
The registry is a system wide resource. Almost all applications will at some
point make use of the registry. The FortunaCSI PRNG must be careful not to place
too many demands on the registry. To accomplish this, every source thread has
it's thread priority lowered. This is done by
BOOL m_bSetThreadPriority = SetThreadPriority(hThreadId,
THREAD_PRIORITY_BELOW_NORMAL);
Also, after every access to the registry, each source thread walking the
registry sleeps for some length of time. The Sleep is typically from 3-10 ms.
The actual sleep time is a function of the thread id, the 'this' pointer, and
the data extracted from the registry. This makes it more difficult for an
attacker to determine which of the 32 pools the data from the registry will end
up in.
The case of a key with restricted access is also handled properly in the
code. Attempting to open registry keys can result in an error code of
ERROR_ACCESS_DENIED
. When this happens, the subkey is abandoned and
execution continues with the parent key of the key with the access restriction.
I tested the Registry walking logic by writing out a log file with every key
and value in the registry. This pointed out a number of problems, particularly
where I wasn't properly handling keys that didn't have read permission.
HKEY_CLASSES_ROOT\CLSID
One source thread of this Fortuna implementation cycles over all the guids in
the HKCR\CLSID list. This is a list of every guid for every COM object installed
on the machine. On my 2.2 Ghz Windows/XP laptop (about 1 year old) I have
roughly 5600 guid's in this list and each cycle over these guids takes around 1
min.
When adding the bytes of the guids to the data pools, I only count 1 bit of
entropy for each byte of the guid that is different than the preceeding guid.
This keeps the estimate of the amount of 'entropy' very conservative. This is
important to avoid reseeding the generator too often.
Reading all the CLSID's on my system once takes about 30 minutes.
HKEY_CLASSES_ROOT
Processing HKCR takes about 2 and 1/2 hours on my machine.
HKEY_CURRENT_USER
HKCU\Software is where the applications store application specific data in
the registry. For example, your text editor will likely put the list of your
most recently used files here. Each application contains a lot of data like this
plus gui data such as your background colour etc. On my Win/XP laptop, it takes
about 8 min for a dedicated thread to cycle through all this registry data.
Dumping out the key names and only ascii data for the keys resulted in a file of
around 270K.
On my machine it takes about 30 minutes to walk HKCU.
HKEY_LOCAL_MACHINE
HKLM is the largest repository of the registry. On my machine it takes about
5 hours to process.
Process Information
There is a wealth of information available for each
process. If you look at the Task Manager, under "View", "Select Columns" you can
see some of the different information that is available:
- Process Name
- Process id
- Memory Usage
- Page Faults
- Handles
- Threads
- I/O Reads
- I/O Read Bytes
- I/O Writes
- I/O Write Bytes
- I/O Other
- I/O Other Bytes
Each of the above quantities is available via different Win32 API calls.
Following are some of the methods and the data that is available. For more
information on the specific functions, see MSDN.
Each non zero byte from the following structures is added to the data pools.
When a byte differs from the value obtained for the same process on a previous
call, the byte is marked as containing one bit of entropy. Otherwise, the data
is added to the pool but no entropy is claimed.
GetProcessIOCounters
The method fills in the IO_COUNTERS
structure. This provides the
following information:
typedef struct _IO_COUNTERS
{ ULONGLONG ReadOperationCount;
ULONGLONG WriteOperationCount;
ULONGLONG OtherOperationCount;
ULONGLONG ReadTransferCount;
ULONGLONG WriteTransferCount;
ULONGLONG OtherTransferCount;
} IO_COUNTERS, *PIO_COUNTERS;
GetProcessTimes
This method provides the kernel time and user time for each process.
GetProcessMemoryInfo
This method fills in the PROCESS_MEMORY_COUNTERS
structure. This
provides the following information:
typedef struct _PROCESS_MEMORY_COUNTERS {
DWORD cb;
DWORD PageFaultCount;
SIZE_T PeakWorkingSetSize;
SIZE_T WorkingSetSize;
SIZE_T QuotaPeakPagedPoolUsage;
SIZE_T QuotaPagedPoolUsage;
SIZE_T QuotaPeakNonPagedPoolUsage;
SIZE_T QuotaNonPagedPoolUsage;
SIZE_T PagefileUsage;
SIZE_T PeakPagefileUsage;
} PROCESS_MEMORY_COUNTERS;
GetPerformanceInfo
This method fills in the PERFORMACE_INFORMATION
(note the typo
in the code from psapi.h) structure. This provides the following information:
typedef struct _PERFORMACE_INFORMATION {
DWORD cb;
SIZE_T CommitTotal;
SIZE_T CommitLimit;
SIZE_T CommitPeak;
SIZE_T PhysicalTotal;
SIZE_T PhysicalAvailable;
SIZE_T SystemCache;
SIZE_T KernelTotal;
SIZE_T KernelPaged;
SIZE_T KernelNonpaged;
SIZE_T PageSize;
DWORD HandleCount;
DWORD ProcessCount;
DWORD ThreadCount;
} PERFORMACE_INFORMATION, *PPERFORMACE_INFORMATION;
QueryWorkingSet
The Win32 API provides QueryWorkingSet
to access the current
working set of memory pages loaded for a process. I found a somewhat dated
article to provide more information on the working set. If you are interested,
here is the article by Matt Pietrek which
gives more information on the working set (if someone knows of a more recent
article I'd be glad to provide a reference to that instead).
Each time the working set for a process is obtained, the bytes are compared
the bytes from the previous working set. If the size is different, then each non
zero byte adds one bit of entropy. If the number of bytes is the same, then each
each different byte adds one bit of entropy. Bytes that are the same as those
from the previous working set are added to the pool but no entropy is claimed.
This keeps the entropy estimate very conservative.
Some working sets can be quite large, well over 10MB of data. Because of
this, I decided to limit the working sets considered to only those working sets
below 1MB of data. Here are some of the applications I found with large working
sets:
- svchost.exe, 22MB
- Windows Explorer, 21MB
- Yahoo Pager, 18MB
- Visual Studio 7, 45MB
- Internet Explorer, 30MB
I was surprised how large the working set was for Yahoo Pager. Before I
started filtering out the large working sets, Fortuna was using large amounts of
memory and generating a lot of page faults. Removing the large working sets
resulted in more reasonable system resource usage for the PRNG.
Microsoft Crypto API
Microsoft has introduced a cryptographic API which they claim is suitable for
cryptographic purposes. Unfortunately, nothing is published regarding the
internals of the Crypto API. Therefore some people are not comfortable using it
due to privacy concerns. Also, given some Microsoft security practices in the
past, it is difficult to know to what level should the Crypto API be trusted.
My approach here is to use the Crypto API as one source of 'random' data. Why
not? It increases the workload of the attacker; now the attacker must compromise
the Microsoft Crypto API as well as the other data sources used. The Crypto API
generates 32 bytes of random data every 100ms. These bytes are distributed over
all of the data pools using an estimate of 1 bit of entropy per byte of 'random'
data. This is a conservative estimate obviously, because the internals of the
Crypto API are unpublished.
Ping
Ping is a network command used to determine the time of a network
transmission. I use the high resolution timer to time the various calls to
perform a Ping. Even if the ping times out, the timing data still has useful
randomness for seeding the prng pools. Ping timing data only compresses by
around 2%. I wasn't able to gather enough timing data from Ping to analyze the
results using Diehard.
Seed File
I have implemented the seed file a little differently than the authors of
Practical Cryptography. If I'm understanding section 10.6.1 (page 178)
correctly, they are only writing out the state from the generator. So, each time
the PRNG is restarted, all the entropy in the pools is lost.
In my implementation, I write out the state of each of the 32 entropy pools.
I also write out the key from the generator as well as the counter used by the
generator. Some of the entropy pools are only used in the reseed very seldom and
provide security during reseeds when the generator state is compromised. So in
my opinion the state of the entropy pools is worth saving in the seed file.
I encrypt the seed file using AES with a 256 bit key in CTR mode. The key is
the SHA256 hash of the user provided seed file password concatenated to a salt
value.
One of the issues surrounding the use of the seed file is ensuring that the
same seed file is never used twice. To avoid possible repeated values if the
same seed file is used twice (see the book for a good discussion of how this
could happen), after the seed file is read and before the Fortuna PRNG is
allowed to return pseudo random bytes, more data is added to the hash of the
pool entropy. This data includes a hash of the current snapshot of all
processes, time information, and other available data. See the function
GetMachineSignature()
in FortunaUtils.cpp for all the gory
details.
Using the code
A brief description of how to use the article or code. The class names, the
methods and properties, any tricks or tips.
Using the code is quite straightforward. All of the complexity is hidden from
the user. Here is how to set up the Fortuna PRNG.
Fortuna* pFortuna = new Fortuna();
std::string sFileName("FortunaCSI.bin");
std::string sPassword("MtNorquay~246");
bStatus = pFortuna->SetSeedFile(sFileName, sPassword);
if (!bStatus)
{
printf("Set Seed File Failed\n");
assert(0);
}
bool bReadSeedFile = false;
if (bReadSeedFile)
{
CitadelSoftwareInc::Fortuna::FortunaErrors fError =
CitadelSoftwareInc::Fortuna::eNoErrors;
bStatus = pFortuna->ReadSeedFile(fError);
if (!bStatus)
{
printf("Failed to read seed file");
assert(0);
}
}
pFortuna->StartThreads();
Note that the use of the seed file is optional. The
StartThreads()
method is necessary to actually begin collecting
data, walking the registry, monitoring processes, etc.
Once the set up is complete, then calculating random numbers is quite
straightforward:
unsigned int uint = 0;
bool bStatus = pFortuna->GetRandomInt(uint);
if (!bStatus)
printf("Fortuna PRNG is not ready yet\n");
else
...
There is also a method to get an arbitrarily long array of random bytes
bool GetRandomBytes(unsigned char* pData, unsigned int numBytes);
Points of Interest
Writing server applications requires extensive testing. After adding a new
feature, I would usually test by letting the program run for 8 to 12 hours,
requesting a random number every 1 to 10 seconds. After adding the
QueryWorkingSet
data source and running for 8 hours I found (from
the Task Manager) that there were some 24000 handles being used - oops. I
quickly found the problem and fixed it. The GetPerformanceInfo
function returns the number of handles; perhaps that value should be monitored
and an indication of a problem be written to a log file if the value goes over
some threshold.
To demonstrate the use of the prng I wrote a little MFC app which generates
passwords. The screen shots at the top of this CodeProject page show the Fortuna
monitor I added. This is an MFC dialog which monitors the state of the generator
and the source and pool objects. The 'Peek' edit box for the Sources shows the
actual data being fed to the pools from the currently selected source.
If you have any questions or comments on my implementation of Fortuna I'd be
glad to hear from you. My email address is ron at citadelsoftware dot ca.
Cheers!
History