Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / optimization

An Efficient String to Unsigned int ID Struct

4.58/5 (5 votes)
22 Jun 2022MIT5 min read 10.4K   66  
C++ optimization for map using string key among others
This article presents the structure used in the Kigs framework to efficiently convert strings (std::string, const char *) to unsigned int IDs with the intention of using them as keys, generally for unordered_map structures.

Table of Contents

Introduction

The base class of the kigs framework (CoreModifiable) provides access to some member variables, called attributes, using a string name:

C++
// set "IntValue" parameter to 16
instance->setValue("IntValue", 16);
// retrieve value using templated "getValue" method 
int v = instance->getValue<int>("IntValue");
// or with reference parameter
int v1;
if(!instance->getValue("IntValue",v1))
{
    // instance does not own "IntValue" parameter
}

The instance factory also uses string names for classes:

C++
myApplicationTimer = KigsCore::GetInstanceOf("ApplicationTimer", "Timer");

And we have a lot of other cases where strings are used to access values stored in maps (ordered or unordered).

But for performance reasons, it's not a good choice to use a std::string as a map key.

The KigsID Struct

We looked for a way to convert string (const char* or std::string) to unsigned int ID in an efficient way. The strings we use are generally one or two alphanumeric words, and the map contains at most a few dozen values. We wanted to avoid collisions (having the same ID for two different strings) for this kind of usage.

Here is the KigsID struct code:

C++
struct KigsID
{
    KigsID() = default;
    KigsID(const KigsID& other) = default;
    KigsID(KigsID&& other) = default;

    KigsID& operator=(const KigsID& other) = default;
    KigsID& operator=(KigsID&& other) = default;

    bool operator==(const KigsID& other) const
    {
        return (_id == other._id);
    }

    bool operator<(const KigsID& other) const
    {
        return _id < other._id;
    }
    unsigned int toUInt() const { return _id; }

#ifdef KEEP_NAME_AS_STRING
    template<size_t _Size> KigsID(const char(&aid)[_Size]) : _id_name(aid), 
                                  _id(CharToID::GetID(aid)) {};
    KigsID(const kstl::string& aid) : _id_name(aid), _id(CharToID::GetID(aid)) {};
    KigsID(const kstl::string_view& aid) : _id_name(aid), _id(CharToID::GetID(aid)) {};

    KigsID(unsigned int aid) : _id_name("*unknown*"), _id(aid) {};

    KigsID& operator=(const kstl::string& aid) 
            { _id_name = aid; _id = CharToID::GetID(aid); return *this; };
    template<size_t _Size> KigsID& operator=(const char(&aid)[_Size]) 
            { _id_name = aid; _id = CharToID::GetID(aid); return *this; };

    KigsID& operator=(unsigned int aid) 
            { _id_name = "*unknown*"; _id = aid; return *this; };
    
    // Extra name
    // Don't set this field manually!
    kstl::string _id_name;

#else
    template<size_t _Size>
    KigsID(const char(&aid)[_Size]) : _id(CharToID::GetID<_Size>(aid)) {};
    KigsID(const kstl::string& aid) : _id(CharToID::GetID(aid)) {};
    KigsID(const kstl::string_view& aid) : _id(CharToID::GetID(aid)) {};
    //KigsID(const char*& aid) : _id(CharToID::GetID(aid)) {};
    KigsID(unsigned int aid) : _id(aid) {};

    template<size_t _Size>
    KigsID& operator=(const char(&aid)[_Size]) 
                     { _id = CharToID::GetID<_Size>(aid); return *this; };
    KigsID& operator=(const kstl::string& aid) 
                     { _id = CharToID::GetID(aid); return *this; };
    KigsID& operator=(const kstl::string_view& aid) 
                     { _id = CharToID::GetID(aid); return *this; };
    //KigsID& operator=(const char*& aid) { _id = CharToID::GetID(aid); return *this; };
    KigsID& operator=(unsigned int aid) { _id = aid; return *this; };

#endif

    // Don't set this field manually!
    unsigned int _id;
};

The define KEEP_NAME_AS_STRING is set when building in debug mode, so that it's easy when debugging to see the original string used to initialize the ID.

Here is a sample code to illustrate this:

C++
void    Sample::KigsIDTest(const KigsID& id)
{
    printf("KigsID unsigned int value = %08x\n", id._id);
    printf("sizeof(KigsID) = %d\n", sizeof(KigsID));
}

The KigsIDTest method is called like this:

C++
KigsIDTest("HelloWorld");

And here is the id variable inspection in debug mode in Visual Studio C++ debugger:

Image 1

If KEEP_NAME_AS_STRING is not defined, only unsigned int _id member is available.

Points of Interest

So the interesting part now is that in release mode, with optimizations activated, a call to KigsIDTest("HelloWorld"); outputs this:

Image 2

So sizeof(KigsID) is 4 bytes (32 bits unsigned int).

Another interesting thing is that here, the "HelloWorld" string conversion to int is evaluated at compile time as we can see if we look at the disassembly code:

Image 3

It is the case for const char* strings up to 19 characters with Visual C++ compiler (x86 build).

For longer strings or std::string, the ID is evaluated at run time using a few logical operations (XOR, ROL) on 4 bytes packets.

Benchmark

To test the performances of the structure, we have set up a little benchmark code. For the use we make of it, the critical part is the access to the map values, so the benchmark focuses on that.

Benchmark Code

C++
template<typename maptype, typename vectorkeys>
double TestKigsIDPerfs::testUMap()
{
    // get start time
    double startTime = mApplicationTimer->GetTime();

    // create empty unordered_map 
    maptype                                    testMap;
    // create a vector containing all the keys we are going to test
    std::vector<vectorkeys>                    testVector;
    // some "similar keys"
    testVector.push_back("testString1");
    testVector.push_back("testString2");
    testVector.push_back("testString3");
    testVector.push_back("testString4");
    testVector.push_back("testString5");
    // some random keys
    testVector.push_back("ajzhgjhsqfd");
    testVector.push_back("xcwvksqjdfh");
    testVector.push_back("nzsbdocdsdf");
    testVector.push_back("zieulkjvsqd");
    testVector.push_back("poizerbhncs");
    // some incremental length keys
    testVector.push_back("az");
    testVector.push_back("aze");
    testVector.push_back("azer");
    testVector.push_back("azert");
    testVector.push_back("azerty");
    testVector.push_back("azertyu");
    testVector.push_back("azertyui");
    testVector.push_back("azertyuio");
    testVector.push_back("azertyuiop");
    testVector.push_back("azertyuiopq");

    // fill map, associating a unsigned int to each key 
    for (unsigned loop = 0; loop < testVector.size(); loop++)
    {
        testMap[testVector[loop]] = (loop + 1) * (loop + 1);
    }

    // test a lot of map access
    int vectorSize = testVector.size();
    unsigned int voidComputation = 0;
    for (unsigned int accessloop = 0; accessloop < (1000 * 1000 * 100); accessloop++)
    {
        // some computation, just to be sure the compiler really access the map
        voidComputation = voidComputation << 2;
        voidComputation ^= testMap[testVector[accessloop % vectorSize]];
    }
    // get end time
    double endTime = mApplicationTimer->GetTime();

    // print computation result, should be the same for all the different tests
    printf("voidComputation = %d\n", voidComputation);

    // return benchmark time
    return endTime - startTime;
}

We tested two kinds of unordered maps: std::unordered_map and robin_hood::unordered_map (https://github.com/martinus/robin-hood-hashing)

We also tested with the default hash method, and with the KigsIDHash method, just returning the ID itself:

C++
struct KigsIDHash
{
    std::size_t operator()(const KigsID& k) const
    {
        return k._id;
    }
};

Here are all the different benchmark calls:

C++
// unordered map
double ti=testUMap<std::unordered_map<std::string,unsigned int>, std::string >();
printf("Unordered map with string map keys and string access: %f s\n", ti);

ti = testUMap<std::unordered_map<KigsID, unsigned int>, std::string >();
printf("Unordered map with KigsID map keys and string access: %f s\n", ti);

ti = testUMap<std::unordered_map<KigsID, unsigned int>, KigsID>();
printf("Unordered map with KigsID map keys and KigsID access: %f s\n", ti);

ti = testUMap<std::unordered_map<KigsID, unsigned int, KigsIDHash>, KigsID>();
printf("Unordered map with KigsID map keys, KigsID access and KigsIDHash: %f s\n", ti);

// robin hood unordered map
ti = testUMap<robin_hood::unordered_map<std::string,unsigned int>, std::string>();
printf("Robin hood unordered map with string map keys and string access: %f s\n", ti);

ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int>, std::string>();
printf("Robin hood unordered map with KigsID map keys and string access: %f s\n", ti);

ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int>, KigsID>();
printf("Robin hood unordered map with KigsID map keys and KigsID access: %f s\n", ti);

ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int, KigsIDHash>, KigsID>();
printf("Robin hood unordered map with KigsID map keys,
        KigsID access and KigsIDHash: %f s\n", ti);

Building the Benchmark

To easily build and test the code (Git, Microsoft Visual Studio and CMake needed, see Kigs framework Getting Started or watch this video tutorial), clone https://github.com/Kigs-framework/kigs. Then download and extract TestKigsIDPerfs.zip in the kigs/projects directory.

Open kigs/projects/CMakeLists.txt with your favorite text editor, and add the following line:

add_subdirectory(TestKigsIDPerfs) 

Save and close CMakeLists.txt.

The solution generation script : kigs/scripts/generateWinCMake.bat works with Visual Studio 2022 now. If you work with Visual Studio 2019, please edit generateWinCMake.bat and change "Visual Studio 17 2022" by "Visual Studio 16 2019" before launching the script.

Go to kigs/scripts directory and launch generateWinCMake.bat script (double click on it).

Then go to Build/solutionWinCMake/kigs/projects/TestKigsIDPerfs directory and double click TestKigsIDPerfs.sln to open the solution in Visual Studio.

In Visual Studio, select TestKigsIDPerfs as your "startup project" and StaticRelease as your build configuration. Generate and launch.

Benchmark Results

The benchmark was built with full optimizations and launched on:

  • a Windows 10 PC (Core i7-4790K 4.00 Ghz), build with Visual Studio 2019 compiler.
  • Firefox (HTML5) on the same Windows 10 PC, build with Emscripten under WSL.
  • an Android Samsung Galaxy Tab S2 (armebiv7a), build with CLANG 5.0 compiler under Visual Studio 2019.
  • Hololens 2 (Arm64), build under Visual Studio 2019.

All times are in seconds.

  Win10 HTML5 Android Hololens 2
std::unordered_map, string key, string access, std::hash 2.765395 100% 5.633000 100% 25.914861 100% 8.705835 100%
std::unordered_map, KigsID key, string access, std::hash 1.785608 155% 2.444000 230% 19.587981 132% 5.171553 168%
std::unordered_map, KigsID key, KigsID access, std::hash 0.722537 383% 1.947000 289% 17.237732 150% 2.843732 306%
std::unordered_map, KigsID key, KigsID access, KigsIDHash 1.002372 276% 1.916000 294% 17.229091 150% 2.866783 304%
robin_hood::unordered_map, string key, string access, robin_hood::hash 3.930436 70% 4.103000 137% 15.034525 172% 7.228063 120%
robin_hood::unordered_map, KigsID key, string access, robin_hood::hash 2.069901 134% 2.766000 204% 10.752449 241% 4.625862 188%
robin_hood::unordered_map, KigsID key, KigsID access, robin_hood::hash 1.138290 243% 1.638000 344% 8.765440 296% 2.956348 294%
robin_hood::unordered_map, KigsID key, KigsID access, KigsIDHash 1.142882 242% 1.647000 342% 8.845147 293% 2.946729 295%

Conclusion

Using the KigsID structure as the access key for unordered maps is always a good choice for our usage in the Kigs framework. On the contrary, using KigsIDHash seems to be a bad idea.

Choosing between std::unordered_map and robin_hood::unordered_map depends a lot on the compiler and targeted platform.

If you want to have a closer look at the KigsID code, you can find all the Kigs framework code here:

The KigsID structure is defined here:

You can also start reading the introduction to the Kigs framework article series on CodeProject:

 

     And follow us on Twitter or YouTube !

History

  • 27th January, 2020: Initial version
  • 19th February, 2020: Added Table of Contents, Benchmark and source code download
  • 21th June, 2022: change links to the Kigs-framework GitHub repo, fixed the code and added link to video tutorial

License

This article, along with any associated source code and files, is licensed under The MIT License