Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

SFMT in Action: Part I – Generating a DLL Including SSE2 Support

0.00/5 (No votes)
28 Apr 2009 2  
An approach for using the SFMT (SIMD-oriented Fast Mersenne Twister) random number generator algorithm.

Contents

Introduction

As you know, lots of software developers need random numbers while they develop applications. Especially, financial and estimation based applications are commonly used areas of random numbers. Today, there are many random number generators, and some of them are open source and free to use. Both MT (Mersenne Twister) and its improved version SFMT (SIMD-oriented Fast Mersenne Twister) are very popular and well known random number generator algorithms.

Approach

The first part of the “SFMT in Action” series is about generating a SIMD-oriented Fast Mersenne Twister DLL. This DLL will be able to use the CPU’s capabilities, such as SSE2.

Streaming SIMD Extensions 2: SSE2

SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD (Single Instruction, Multiple Data) instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3, in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs, in 2003.

When applications are designed to take advantages of SSE2 and run on machines that support SSE2, they're almost always faster than before. Today, many CPUs support the SSE2 instruction set. For detailed information about SSE2, please visit this link.

About SFMT

Before starting to generate the SFMT DLL, let’s talk about it.

SIMD-oriented Fast Mersenne Twister (SFMT) is a Linear Feedbacked Shift Register (LFSR) generator that generates a 128-bit pseudorandom integer at one step. It was introduced by Mutsuo Saito and Makoto Matsumoto (from Hiroshima University) in 2006. SFMT is designed with the recent parallelism of modern CPUs, such as multi-stage pipelining and SIMD (e.g., 128-bit integer) instructions. It supports 32-bit and 64-bit integers, as well as double precision floating point as output. SFMT is a variant of Mersenne Twister (MT), and is twice as faster. So, it’s nice to know that the SFMT DLL is available to generate both 32-bit and 64-bit integers.

You can find SFMT’s official site here.

Theory - A Mathematical Model of SFMT

A detailed explanation of the academic concept of the SFMT structure can be found here.

Let’s Start

As I've said yet, in this article, I'll try to generate an SFMT DLL, and when I do this, I'll use the original version of SFMT codes. Its original C implementation (version 1.3.3) can be downloaded from here. During my development, some special and necessary changes on the original C implementation and reasons to modify the original code will be explained step by step. The base concept when generating SFMT.dll is not to change or modify its core codes, but make these codes callable and usable form outside of the generated DLL.

Note that I'll use Visual Studio 2008 on Windows Vista; both for analyzing the original code and developing the SFMT DLL.

In Visual Studio, I start a new C++ Win32 project named SFMT:

Sample Image

Now, the Win32 Application Wizard will be shown. In this window, from Application Settings, I choose “DLL” for Application type, and tick “Empty project” for Additional options.

Sample Image

After clicking the Finish button, a new empty project will be created on the Visual Studio screen.

I unzipped the original C implementation code of SFMT which I downloaded from this address under the Visual Studio 2008\Projects\SFMT directory. After unzipping, you see lots of files, but be sure we won't use all of them. Some of them are for test purposes, and some of the files include test results.

Actually, there are five main code files in the C implementation (version 1.3.3) that I focused on, and they are:

  1. SFMT.c: The main code for the SFMT’s generator engine is in this file. It implements the main methods, for example, the “gen_rand32” method for generating 32-bit integers.
  2. SFMT.h: Via this file, the main methods can be called easily. In addition, other useful methods such as generating real numbers are implemented here.
  3. SFMT-params.h: It includes some basic definitions such as MEXP and parameters to be used while generating pseudo random numbers. Also, some preprocessor rules for the current MEXP (Mersenne Exponent) and “include” structures are coded here.
  4. #elif MEXP == 19937 #include "SFMT-params19937.h"
  5. SFMT-sse2.h: It provides SSE2 support and, of course, it has access to accelerated codes via the CPU’s SSE2 instructions. It uses emmintrin.h.
  6. SFMT-paramsXXXXXX.h: Other necessary parameters are located in these files. Here, XXXXXX represents the MEXP constant. There are ten parameter files which are configured for different MEXP values. MEXP and the meaning of it are mentioned in the next paragraph.

In the code, you'll see a definition called MEXP, and it’s the starting point to use the algorithm. MEXP means Mersenne Exponent. The period of the generated code will be 2MEXP-1. It’s a must be definition to use the algorithm. It must be one of these values: 607, 1279, 2281, 4253, 11213, 19937, 44497, 86243, 132049, 216091.

Unless you haven't specified it, the default value is 19937.

If you examine the original implementation of SFMT, you see that it can be compiled in three possible platforms:

  1. Standard C without SIMD instructions
  2. CPUs with Intel's SSE2 instructions + C compiler which supports these features
  3. CPUs with PowerPC's AltiVec instructions + C compiler which supports these features

Above, as you see, number 3 isn't applicable for Microsoft based platforms, because it uses AltiVec instructions. Number 2 (using the power of SSE2 instructions) is the way to go for me. While generating the DLL, my target is to modify the code to be compiled with the SSE2 instructions. Therefore, first of all, I'll clean some unnecessary parts of the code. Also, at the end of the development, when I build and compile the SFMT.dll, you'll switch easily between standard C and SSE2 supported versions.

Adding SFMT.c to the Project and Modifying it

In the Solution Explorer, under the SFMT project, I added the existing SFMT.c file to the Source Files directory and opened it to modify.

Sample Image

At the beginning, I detached some preprocessor codes in the SFMT.c file. For example, it includes some definitions and meanings like this:

  • #if defined(HAVE_ALTIVEC): This is optional. If this macro is specified, the optimized code for AltiVec will be generated. This macro automatically turns on the BIG_ENDIAN64 macro.
  • #if defined(BIG_ENDIAN64): This macro is required when your CPU is BIG ENDIAN and you are using 64-bit output. So, it’s for PowerPC-based computers with a Macintosh Operating System.
  • #if defined(ONLY64): This macro is optional. If this macro is specified, the optimized code for 64-bit output for BIG ENDIAN CPUs will be generated, and code for 32-bit output won't be generated.
  • The HAVE_ALTIVEC, BIG_ENDIAN64, or ONLY64 preprocessor commands and their related code aren't applicable or suitable for Windows platforms, and I removed these commands and their related code from the SFMT.c file carefully.

    On the other hand, there’s a preprocessor definition called HAVE_SSE2, and it’s a critical one for us. It’s important to keep HAVE_SSE2 and its related code in the file when removing other unnecessary definitions.

  • #if defined(HAVE_SSE2): If this macro is specified, optimized code for SSE2 will be generated.
  •  

    32-bit output

    LITTLE ENDIAN 64-bit output

    BIG ENDIAN 64-bit output

    required

    MEXP

    MEXP

    MEXP, BIG_ENDIAN64

    optional

    HAVE_SSE2, HAVE_ALTIVEC

    HAVE_SSE2

    HAVE_ALTIVEC, ONLY64

In SFMT.c file, there are two functions that are used for filling arrays with 32 bit or 64 bit random integer numbers. First is fill_array32 and second is fill_array64. I changed some part of these functions and want to mention these changes here:

  • Changing return type of fill_array32 and fill_array64: In the original C implementation of SFMT, both of the fill_array functions return nothing. It means they're used with void keyword. In my SFMT.dll, I upgraded the return type of these functions to int. After that, these functions had the ability to return 0 or 1 values. If the function returns 0, it means array isn't filled successfully by the function. This almost always indicates, some memory allocation for process is down. If the function returns 1, it means array's filled successfully by the function and the array is ready to use.
  • Always using extended size arrays for compatibility and flexibility: In the original C implementation of SFMT, there are two rules for both fill_array32 and fill_array64 functions:
    1. The size of array must be greater than or equal to (MEXP / 128 + 1) * 4 for fill_array32 and must be greater than or equal to (MEXP / 128 + 1) * 2 for fill_array64.

    2. The size of array must be a multiple of 4 for fill_array32 and must be a multiple of 2 for fill_array64.

Because of these rules, I had to use extended size arrays when generating pseudo random numbers. Also, it's very important and much flexible to have the ability using all the sizes for array. To fulfill the arrays, I coded new functions and added them to SFMT.c code file. These functions are listed below:

int get_array32_extended_size(int size)

/**
* This function is used to determine extended size of specified array[]
* in the fill_array32 function.

* Because, array size must be greater than or equal to (MEXP / 128 + 1) * 4
* so, let's fulfill the array if the size smaller than (MEXP / 128 + 1) * 4

* Because, array size must be a multiple of 4.
* so, let's fulfill the array.
*/

int get_array32_extended_size(int size) {
    int extended_size = 0;
    int remainder = 0;

    if (size < get_min_array_size32())
        extended_size = get_min_array_size32();
    else
        extended_size = size;

    remainder = extended_size % 4;
    extended_size = extended_size + 4 - remainder;

    return extended_size;
}

int get_array64_extended_size(int size)

/**
* This function is used to determine extended size of specified array[]
* in the fill_array64 function.

* Because, array size must be greater than or equal to (MEXP / 128 + 1) * 2
* so, let's fulfill the array if the size smaller than (MEXP / 128 + 1) * 2

* Because, array size must be a multiple of 2.
* so, let's fulfill the array.
*/

int get_array64_extended_size(int size) {
    int extended_size = 0;
    int remainder = 0;

    if (size < get_min_array_size64())
        extended_size = get_min_array_size64();
    else
        extended_size = size;

    remainder = extended_size % 2;
    extended_size = extended_size + 2 - remainder;

    return extended_size;
}

As I've mentioned in the previous paragraph, these modifications are very important. Via these modifications, we eliminated both the rule of array size must be multiple of 4 or multiple of 2 and the rule of array size must be greater than or equal to (MEXP / 128 + 1) * 4 or (MEXP / 128 + 1) * 2. To be more clear, for example, if you want to generate 2113 count integer number, you can do it easily by using modified fill_array32 or fill_array64 functions. By using the original version of fill_array32 and fill_array64 functions, you can't generate total 2113 count integer. Because 2113 isn't a multiple of 4 or multiple of 2.

Note: Body of modified fill_array32 and fill_array64 functions that integrated with get_array32_extended_size and get_array64_extended_size functions are mentioned below.

  • data alignment and using aligned memory blocks: To use fill_array functions, the pointer to the array must be \b "aligned" (namely, must be a multiple of 16) in the SIMD version, since it refers to the address of a 128-bit integer. In the standard C version, the pointer is arbitrary. If we defined HAVE_SSE2 macro, then it requires pointer to the array must be used 16 byte aligned memory blocks to generate random integers. Because, SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16. 16 byte alignment is a must be for using SSE2 support. Also, misaligned data slows down data access performance, too. You can visit songho page and IBM's this page to get more information about data alignment and 16 byte alignment for SSE2.

In MSVC CRT, a dynamic array can be allocated using _aligned_malloc() function, and deallocated using _aligned_free(). Below, the code for aligned memory allocation that is used in the fill_array32 and fill_array64 is given.

int* ptr;

#if defined(HAVE_SSE2)
    ptr = (w128_t *) _aligned_malloc(sizeof(uint32_t) * extended_size, 16);
#else
    ptr = (w128_t *) _aligned_malloc(sizeof(uint32_t) * 
			extended_size, __alignof(uint32_t));
#endif

The modified fill_array32 and fill_array64 functions are listed below:

int fill_array32(uint32_t *array, int size) {
    int* ptr;
    int extended_size = get_array32_extended_size(size);

    assert(initialized);
    assert(idx == N32);

    /* The pointer to the allocated memory must be \b "aligned"
     * (namely, must be a multiple of 16) in the SIMD version (HAVE_SSE2), since it
     * refers to the address of a 128-bit integer. In the standard C
     * version, the pointer is arbitrary.
     */
    #if defined(HAVE_SSE2)
        ptr = (w128_t *) _aligned_malloc(sizeof(uint32_t) * extended_size, 16);
    #else
        ptr = (w128_t *) _aligned_malloc(sizeof(uint32_t) * 
			extended_size, __alignof(uint32_t));
    #endif

    if (ptr == NULL)
        return 0;
    else {
        gen_rand_array(ptr, extended_size / 4);
        memcpy((w128_t *)array, ptr, sizeof(uint32_t) * size);
        idx = N32;
        _aligned_free(ptr);
    }
    return 1;
}
int fill_array64(uint64_t *array, int size) {
    int* ptr;
    int extended_size = get_array64_extended_size(size);

    assert(initialized);
    assert(idx == N32);

    /* The pointer to the allocated memory must be \b "aligned"
     * (namely, must be a multiple of 16) in the SIMD version (HAVE_SSE2), since it
     * refers to the address of a 128-bit integer. In the standard C
     * version, the pointer is arbitrary.
     */
    #if defined(HAVE_SSE2)
        ptr = (w128_t *) _aligned_malloc(sizeof(uint64_t) * extended_size, 16);
    #else
        ptr = (w128_t *) _aligned_malloc(sizeof(uint64_t) * 
			extended_size, __alignof(uint64_t));
    #endif

    if (ptr == NULL)
        return 0;
    else {
        gen_rand_array(ptr, extended_size / 2);
        memcpy((w128_t *)array, ptr, sizeof(uint64_t) * size);
        idx = N32;
        _aligned_free(ptr);
    }
    return 1;
}

Modifying the SFMT-params.h File

In this file, I removed only the #ifdef __GNUC__ preprocessor definition and its related code. Because I am using the Microsoft Visual Studio C++ compiler for generating a DLL, I don't need GNU based codes.

You can see some basic definitions in this file. Their structure and meanings are like this:

/*-----------------
BASIC DEFINITIONS
-----------------*/
/** Mersenne Exponent. The period of the sequence
* is a multiple of 2^MEXP-1.
* #define MEXP 19937 */
/** SFMT generator has an internal state array of 128-bit integers,
* and N is its size. */
#define N (MEXP / 128 + 1)
/** N32 is the size of internal state array when regarded as an array
* of 32-bit integers.*/
#define N32 (N * 4)
/** N64 is the size of internal state array when regarded as an array
* of 64-bit integers.*/
#define N64 (N * 2)

Also, some Mersenne Exponent dependent #include preprocessor commands were included. The code structure is listed below:

#if MEXP == 607
  #include "SFMT-params607.h"
#elif MEXP == 1279
  #include "SFMT-params1279.h"
#elif MEXP == 2281
  #include "SFMT-params2281.h"
#elif MEXP == 4253
  #include "SFMT-params4253.h"
#elif MEXP == 11213
  #include "SFMT-params11213.h"
#elif MEXP == 19937
  #include "SFMT-params19937.h"
#elif MEXP == 44497
  #include "SFMT-params44497.h"
#elif MEXP == 86243
  #include "SFMT-params86243.h"
#elif MEXP == 132049
  #include "SFMT-params132049.h"
#elif MEXP == 216091
  #include "SFMT-params216091.h"
#else
#endif

The MEXP value is used as a criteria for determining and including correct parameter files to the project, and via this mechanism, developers can use their necessary parameter files by just changing the value of MEXP. Because of this mechanism, the original SFMT implementation covers ten different SFMT-paramsXXXXXX.h header files.

In my project, I used 19937 for MEXP. Also, 19937 is the default value for the original C implementation too.

Changes in the SFMT-paramsXXXXXX.h Files

After modifying the SFMT-params.h file, it’s time to make changes in the associated SFMT-paramsXXXXXX.h files. There are ten files and, each has its own descriptions. The MEXP constant can take ten different values and so, there are ten different paramsXXXXXX.h files at present. I use 19937 for MEXP and the first file to be changed is SFMT-params19937.h.

In the SFMT-params19937.h header file, there are some parameters for Altivec. They start with a #if defined (__APPLE__) structure in the code. I removed this preprocessor code block. This block contains parameters for the MAC OS X and is listed below:

/* PARAMETERS FOR ALTIVEC */
#if defined(__APPLE__) /* For OSX */
#define ALTI_SL1 (vector unsigned int)(SL1, SL1, SL1, SL1)
#define ALTI_SR1 (vector unsigned int)(SR1, SR1, SR1, SR1)
#define ALTI_MSK (vector unsigned int)(MSK1, MSK2, MSK3, MSK4)
#define ALTI_MSK64 \
(vector unsigned int)(MSK2, MSK1, MSK4, MSK3)
#define ALTI_SL2_PERM \
(vector unsigned char)(1,2,3,23,5,6,7,0,9,10,11,4,13,14,15,8)
#define ALTI_SL2_PERM64 \
(vector unsigned char)(1,2,3,4,5,6,7,31,9,10,11,12,13,14,15,0)
#define ALTI_SR2_PERM \
(vector unsigned char)(7,0,1,2,11,4,5,6,15,8,9,10,17,12,13,14)
#define ALTI_SR2_PERM64 \
(vector unsigned char)(15,0,1,2,3,4,5,6,17,8,9,10,11,12,13,14)
#else /* For OTHER OSs(Linux?) */
#define ALTI_SL1 {SL1, SL1, SL1, SL1}
#define ALTI_SR1 {SR1, SR1, SR1, SR1}
#define ALTI_MSK {MSK1, MSK2, MSK3, MSK4}
#define ALTI_MSK64 {MSK2, MSK1, MSK4, MSK3}
#define ALTI_SL2_PERM {1,2,3,23,5,6,7,0,9,10,11,4,13,14,15,8}
#define ALTI_SL2_PERM64 {1,2,3,4,5,6,7,31,9,10,11,12,13,14,15,0}
#define ALTI_SR2_PERM {7,0,1,2,11,4,5,6,15,8,9,10,17,12,13,14}
#define ALTI_SR2_PERM64 {15,0,1,2,3,4,5,6,17,8,9,10,11,12,13,14}
#endif /* For OSX */

Other SFMT-paramsXXXXXX header files are: SFMT-params607.h, SFMT-params1279.h, SFMT-params2281.h, SFMT-params4253.h, SFMT-params11213.h, SFMT-params44497.h, SFMT-params86243.h, and SFMT-params216091.h.

I changed and modified all these parameter files. In other words, I clean out all the unnecessary OS X specific code in the header files.

Below, you can see other necessary parameters that are defined in the SFMT-params19937.h file:

#define POS1 122 // the pick up position of the array.
#define SL1 18 // the parameter of shift left as four 32-bit registers.
#define SL2 1 // the parameter of shift left as one 128-bit register.
#define SR1 11 // the parameter of shift right as four 32-bit registers.
#define SR2 1 // the parameter of shift right as one 128-bit register.

/* A bitmask, used in the recursion. These parameters are introduced
to break symmetry of SIMD. */
#define MSK1 0xdfffffefU
#define MSK2 0xddfecb7fU

#define MSK3 0xbffaffffU
#define MSK4 0xbffffff6U

// These definitions are part of a 128-bit period certification vector.
#define PARITY1 0x00000001U
#define PARITY2 0x00000000U
#define PARITY3 0x00000000U
#define PARITY4 0x13c9e684U

// String representation of MEXP 19937 parameters.
#define IDSTR "SFMT-19937:122-18-1-11-1:dfffffef-ddfecb7f-bffaffff-bffffff6"

SFMT.h File Modifications

The SFMT.h header file is very important. I'll add this file to my project. Of course, it’s a header (*.H) file so, I add it to the Header Files directory of my project. After making some modifications on it, I'll be able to call the SFMT functions outside of my DLL. Before talking about the changes, let’s look at the SFMT.h functions, declarations, their missions:

  1. uint32_t gen_rand32(void): The mission of this function is to generate pseudorandom 32-bit integers. the approach of this function is named the sequential call method.
  2. uint64_t gen_rand64(void): The mission of this function is to generate pseudorandom 64-bit integers. The approach of this function is named the sequential call method.
  3. int fill_array32(uint32_t *array, int size): This function can fill an array with pseudorandom 32-bit integers. The first parameter of the function is an array where pseudorandom 32-bit integers are filled. The second parameter of the function is the size of this array. Also, the second parameter represents the number of generated 32-bit integers. The approach of this function is named the block call method. If the function fails, the return value is 0.
  4. int fill_array64(uint64_t *array, int size): This function can fill an array with pseudorandom 64-bit integers. The first parameter of the function is an array where pseudorandom 64-bit integers are filled. The second parameter of function is the size of this array. Also, the second parameter represents the number of generated 64-bit integers. The approach of this function is named the block call method. If the function fails, the return value is 0.
  5. void init_gen_rand(uint32_t seed): This function initializes the internal state array with a 32-bit integer seed. The parameter seed is a 32-bit integer used as the seed.

To call these SFMT functions outside of my DLL, I need to use a special keyword:

__declspec(dllexport): You can export data, functions, classes, or class member functions from a DLL using the __declspec(dllexport) keyword. __declspec(dllexport) adds the export directive to the object file so you do not need to use a .def file. Many export directives, such as ordinals, NONAME, and PRIVATE, can be made only in a .def file, and there is no way to specify these attributes without a .def file. However, using __declspec(dllexport) in addition to using a .def file does not cause build errors.

To export SFMT functions, the __declspec(dllexport) keyword must appear to the left of the calling-convention keyword, if a keyword is specified. For example:

__declspec(dllexport) int fill_array32(uint32_t *array, int size):

__declspec(dllexport) stores function names in the DLL's export table.

To make our code more readable, I'll define a macro for __declspec(dllexport) at the beginning of the SFMT header file, and will use this macro with each function we are exporting:

#define DllExport __declspec( dllexport )

After these modifications, our SFMT functions become an exportable form. You can see them below:

DllExport uint32_t gen_rand32(void);
DllExport uint64_t gen_rand64(void);
DllExport int fill_array32(uint32_t *array, int size);
DllExport int fill_array64(uint64_t *array, int size);
DllExport void init_gen_rand(uint32_t seed);

Real versions of functions: In the SFMT.h file, you can see some real versions of functions. They're due to Isaku Wada, and are used to generate random real numbers. All of the real functions are inline functions. Inline functions cannot be compiled as part of a DLL. An inline function implies that it is compiled into the location that calls it. This implies that an inline function does not have an address since the function is duplicated wherever it is called (i.e., in the main app, for example). If you want to make it as a separate binary library (*.lib, *.dll, etc.), the exported function could not be inline - truly - they are located in the binary file, not in your executable code. Because of these reasons, I clean inline functions as part of the SFMT.h file, and then add the rSFMT.cpp file to my Project under the Source Files directory. This file includes real versions of functions but not inline versions. Then, I form them to be exported, as seen below:

//Exporting rSFMT.cpp functions:
DllExport double to_real1(uint32_t v);
DllExport double genrand_real1(void);
DllExport double to_real2(uint32_t v);
DllExport double genrand_real2(void);
DllExport double to_real3(uint32_t v);
DllExport double genrand_real3(void);
DllExport double to_res53(uint64_t v);
DllExport double to_res53_mix(uint32_t x, uint32_t y);
DllExport double genrand_res53(void) ;
DllExport double genrand_res53_mix(void);

Extern C: After these modifications, if you compile the SFMT DLL and call the exported functions, then you'll get an error message at runtime, like this:

Sample Image

This problem occurs because the C++ compiler decorates the function names to get function overloading. Let’s see the exact name of our functions using the powerful Windows utility dumpbin.exe. Our command is dumpbin -exports SFMT.dll. The result of this command prompt is shown below:

Sample Image

As you see in this command prompt, the function names aren't clear, and when we try to call them, an unhandled exception occurs always.

There isn't any standard way of decorating the function names. So, you have to tell the C++ compiler to not decorate function names. We'll use the extern C structure to not decorate our functions:

At the beginning of the SFMT.h file:

#ifdef __cplusplus
  extern "C" {
#endif

and at the end of the SFMT.h file:

#ifdef __cplusplus
  }
#endif

Now, the code and functions we write between this extern C structure will work correctly and will be callable easily. At this time, let’s see the dumpbin -exports SFMT.dll command results:

Sample Image

The SFMT-sse2.h File

If you look into the SFMT.c file, you'll see this code:

#if defined(HAVE_SSE2)
  #include "SFMT-sse2.h"
#endif

This code means, if you include the HAVE_SSE2 definition in the command line of our project, then the project will use the SFMT-sse2.h file. Therefore, if you examine the SFMT-sse2.h file you'll realize that this file is coded for using the power of the CPU’s SSE2 special commands. Of course, using this file makes our code faster. The first and only limitation of using this file is running it only on SSE2 supported CPUs.

Using SSE2 support and how to enable this functionality is mentioned on the next caption “Setting project properties”.

Setting Project Properties and Optimizations

In Visual Studio, under the Project menu, click “SFMT properties…”.

A new window with an “SFMT Property Pages” caption will be visible. In this window, on the left side, under the “Configuration Properties” tab, you can see some property categories (General, Debugging, C/C++ etc.) that we'll use.

Sample Image

First of all, on the upper side of the project properties window, click the “Configuration Manager” button, and the Configuration Manager will be displayed on the screen. In this window, set the Configuration parameter to Release. Also, set “Active solution configuration” to Release, too. Setting this parameter to Release means the compiling our project doesn't need debug data and it's ready to release.

Sample Image

The most important properties of our SFMT.dll project are Preprocessors.

Under the “Configuration Properties” --> C/C++ --> Preprocessor tab, there are preprocessor definitions. I'll add two definitions here: MEXP and HAVE_SSE2. MEXP has been mentioned before, and it represents the Mersenne Exponent. In addition, the HAVE_SSE2 definition is used for taking advantage of CPU’s SSE2 support.

Sample Image

I want to say that changing the MEXP value or eliminating SSE2 support is very flexible in this situation. You can always configure these two preprocessor definitions and then compile another version of the SFMT.dll easily.

Sample Image

Another important property is “Optimization” Under Configuration Properties --> C/C++ --> Optimization, please be sure Optimization is set to “Maximize Speed (/O2)”. Setting this property to Maximize Speed (/O2) means the compiler will produce some optimization output when we compile the project. This can increase the size of the SFMT.dll, but it can also be disregarded. Because, the speed of SFMT.dll is preferred to bigger size. It’s not necessary to have faster code when we're generating two or three random numbers, but when generating 10 million numbers, the speed of our code becomes a major factor. In time critical applications like mathematical operations or engineering applications, perhaps, a fast code might be more appropriate.

Also, we have to know another option called “Enable Intrinsic Functions (/Oi)” . Programs that use intrinsic functions are faster because they do not have the overhead of function calls, but may be larger because of the additional code created.

Sample Image

In Configuration Properties --> C/C++ --> Code Generation tab, the default value of the Runtime Library option is Multi-threaded DLL (/MD). I'll change this option to Multi-threaded (/MT). This causes your application to use the multithreaded, static version of the run-time library. It defines _MT, and causes the compiler to place the library name LIBCMT.lib into the .obj file so that the linker will use LIBCMT.lib to resolve external symbols.

C/C++ multi-threaded applications on Windows need to be compiled with either the -MT or -MD options. The -MT option will link using the static library LIBCMT.LIB, and -MD will link using the dynamic library MSVCRT.LIB. The binary linked with -MD will be smaller but dependent on MSVCRT.DLL, while the binary linked with -MT will be larger but will be self-contained with respect to the runtime. The actual working code is contained in MSVCR90.DLL (for Visual Studio 2008 projects), which must be available at runtime to applications linked with MSVCRT.lib.

If I build my project with the –MD option (dynamic linking), then my SFMT.dll will be approximately 10 KB. It’s a quite small one. If I build the project with the –MT option (static linking), then my SFMT.dll will be 57 KB. Of course, it’s larger than 10 KB.

On the other hand, If I try to call and use the dynamically linked SFMT.dll on the other computer, possibly, I can get an error like this:

"This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem"

This error shows that the computer and the Operating System which you are trying to run the SFMT.dll on don't have the C/C++ Runtime Libraries. In this situation, you must distribute The C/C++ Runtime Libraries with your SFMT.dll. You can see the analysis of the SFMT.dll running on an Operating System without the C/C++ Runtime Libraries below. As you can see, it needs the MSVCR90.dll and related libraries. Also note that, it’s quite simple to setup the SFMT Project with the necessary C/C++ Runtime Libraries. Because, we're using a powerful IDE: Visual Studio 2008.

Sample Image

In addition, in the Configuration Properties --> C/C++ --> Code Generation tab, set the Enable Enhanced Instruction Set property to Streaming SIMD Extensions 2 (/arch:SSE2). The arch flag enables the use of instructions found on processors that support enhanced instruction sets, e.g., the SSE and SSE2 extensions of Intel 32-bit processors. Note that, with this setting, it will prevent the code running on processors which don't support SSE2 extensions. But, in this project, our processor target is CPUs supporting SSE instructions.

After setting these properties under C/C++ tab, our Command Line is:

/O2 /Oi /GL /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL"
/D "SFMT_EXPORTS" /D "MEXP=19937" /D "HAVE_SSE2" /D "_WINDLL" /D
"_UNICODE" /D "UNICODE" /FD /EHsc /MT /Gy /arch:SSE2
/Fo"Release\\" /Fd"Release\vc90.pdb" /W3 /nologo /c /Zi /TP /errorReport:prompt

Sample Image

On the other tab called “Linker”, it’s important to see the Target Machine property set to MachineX86. This is the default value for our project, but don't forget to check it. The Linker tab’s command will be like this:

/OUT:"C:\Users\emre\Documents\Visual Studio 2008\Projects\SFMT\Release\SFMT.dll"
/INCREMENTAL:NO /NOLOGO /DLL /MANIFEST
/MANIFESTFILE:"Release\SFMT.dll.intermediate.manifest"
/MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG
/PDB:"C:\Users\emre\Documents\Visual Studio 2008\Projects\SFMT\Release\SFMT.pdb"
/SUBSYSTEM:WINDOWS /OPT:REF /OPT:ICF /LTCG /DYNAMICBASE /NXCOMPAT
/MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
uuid.lib odbc32.lib odbccp32.lib

Building the Project and Analyzing SFMT.dll

Now, it is time to build the SFMT project. To do this, simply press the F6 key, or focus on the Build menu of Visual Studio and then click “Build Solution”. If all is OK, then you'll get a message “Build succeeded”. After this, Visual Studio will create a folder named “Release” under the SFMT project main directory. In this folder, you'll see SFMT.dll. To analyze SFMT.dll, I use the Dependency Walker tool. You can download it from here. All exportable functions in SFMT.dll can be seen easily via this GUI. You can see a screenshot representing the SFMT.dll below.

In addition, After building my project I renamed the SFMT.dll to SFMTsse2.dll for future compatibility. Actually, I'll need this kind of criterion when determining and using the right DLL. Anyway, we'll talk about it later.

Sample Image

Without SSE2 Support

If you don't have SSE2 support on the machine which SFMT.dll will run, then you'll get an error. Instead of getting this error, you could easily prepare C version of SFMT.dll and rename it to SFMTc.dll. This SFMTc.dll could generate random numbers without needing SSE2 support. It's too easy to configure project properties for SFMTc.dll:

  1. Under the “Configuration Properties” --> C/C++ --> Preprocessor tab, there are preprocessor definitions. Delete "HAVE_SSE2" Preprocessor command from this window.
  2. In the Configuration Properties --> C/C++ --> Code Generation tab, set the Enable Enhanced Instruction Set property to Not Set.
  3. Rebuild your project and then in the release directory of your project, rename your SFMT.dll to SFMTc.dll.

That's it. You can use your SFMTc.dll on the machines that don't have SSE2 support.

New Articles

New articles of the “SFMT in Action” series are coming soon.

See you later.

References

History

  • December 02, 2008: First release
  • April 26, 2009: Version 1.1 is released. In this version:
    • Added some necessary helper functions
    • Filling methods are improved for flexible usage
    • Added SFMTc.dll in the release directory. This DLL doesn't need SSE2 support.
    • Now, both DLLs (SFMTsse2.dll and SFMTc.dll) include version info

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here