High-Performance Statically Buffered String

Shvetsov Evgeniy

4.66/5 (21 votes)

21 Feb 2016MIT9 min read

57K

199

String class with the auto. managed memory pool and performance tweaks + support modules

Introduction

This class was created in order to replace POD C string (CS) concept, which obviously is a subject to the poor design (personally i consider this as one of the worsests solutions along with the null pointers). Compared to CS the StaticallyBufferedString significantly boosts both speed, safety and functionality.

Background

With the introduction of a C++ there comes an alternatives to the original C-style things: pointers -> references, POD C strings -> std::string, malloc/free -> new/delete, struct -> class, macro -> template, #define MAX 127 -> static const int MAX = 127 etc.

The std::string class is a good, acceptable solution, except the one thing: pooling memory model. It's dynamic by default. Although it benefits with the possibilty of a real-time size control, the cost of using a heap memory sometimes is too high. Both locks (if we don't use a private heap somehow) and fragmentation leads to it. Optimizing memory allocation by prereserving memory is a good way to boost up an overall performance, but it still involves all heap-based allocation problems. Of course you can provide your own allocator to be used by the std::string class, but writing your own allocator is not really an easy problem and that was not my point was originally up to.

Another reason to use a containers with the fixed capacity memory pools is (espically with the strings) that in the most cases the average and maximum size of the object is generatlly well known (due to the predefined limitations) or at least not to hard to be forecasted (usally by a statistics gathering).

There is also a mathematical guarantees against memory allocation failures when using memory pooling (assuming you correctly calculates the minimum required pool size).

Class description

The class methods generally designed to provide std::string like interface, but there are notable things to mention and some specific methods to describe.

The class template was designed to work with the different types of symbols

C++

// 'TElemType' - char / wchar_t / char16_t / char32_t (provides POD C str. if 'TElemType' is a char)
template<typename TElemType, const size_t MaxLen>

(through i tested it precisely with the basic chars only as the other char types i personally does not reallly require and other developers use it rarely too).

The main class traits is:

C++

// NOT thread safe, NO leaks, NO-throw guarantee

The class provides an auto generated hash functor

ADD_HASHER_FOR(TThisType, hasher)

from the HashUtils module

C++

#define ADD_HASHER_FOR(ClassName, HasherName) \
template <typename T = ClassName>\
struct HasherName {\
  auto operator()(const T& obj) const throw() -> decltype(obj.hash()) {\
    return obj.hash();\
  }\
};

to support hash containers.

To provide some compatibility with the STL standart, class defines a mock object type as an internall allocator type (as no allocator really involved here)

C++

typedef AllocatorInterface<TElemType> allocator_type; // mock object type

AllocatorInterface module is pretty simple and self-descriptive

C++

ifndef AllocatorInterfaceH
#define AllocatorInterfaceH

#include <memory>

// C++ allocator concept: http://en.cppreference.com/w/cpp/concept/Allocator
// Fake allocator, which can NOT AND would NOT work!
// Derived classes SHOULD specify the following properties:
//  'propagate_on_container_copy_assignment', 'propagate_on_container_move_assignment',
//   'propagate_on_container_swap' AND 'is_always_equal' (if req.)
template <class T>
class AllocatorInterface : public std::allocator<T> { // better derive from the 'allocator_traits'??
  
public:
  //// MS VS does NOT need this aliases, but GCC does
  typedef typename std::allocator<T> TBaseAllocatorType;
  typedef typename TBaseAllocatorType::pointer pointer;
  typedef typename TBaseAllocatorType::const_pointer const_pointer;
  typedef typename TBaseAllocatorType::reference reference;
  typedef typename TBaseAllocatorType::const_reference const_reference;
  typedef typename TBaseAllocatorType::size_type size_type;

  //// Three constructor versions must be defined (even if they do nothing) 
  ////  to allow for copy-constructions from allocator objects of other types

  AllocatorInterface() = default;

  AllocatorInterface(const AllocatorInterface&) = default;

  template <class U>
  AllocatorInterface(AllocatorInterface<U>&&) throw() {}

  // [!] NO member of the standard default allocator class template shall introduce data races,
  //  calls to member functions that allocate or deallocate storage shall occur in a single total order 
  //   and each such deallocation shall happen before the next allocation (if any) in this order [!]

  virtual ~AllocatorInterface() = default;
  
  virtual pointer address(reference ref) const throw() {
    return std::allocator<T>::address(ref);
  }

  virtual const_pointer address(const_reference ref) const throw() {
    return std::allocator<T>::address(ref);
  }

  //// Pure virtual
  virtual pointer allocate(size_type count, std::allocator<void>::const_pointer hint = nullptr) = 0;
  virtual void deallocate(pointer addr, size_type count) = 0;

  virtual size_type max_size() const throw() {
    return size_type();
  }

  //// 'construct ' AND 'destroy' are both template funcs.
  ////  so can NOT be virtual (they will be used from a 'std::allocator')

  //// Custom allocators may contain state:
  ////  each container or another allocator-aware object stores an instance of the supplied allocator
  ////   and controls allocator replacement through 'std::allocator_traits'

private:
};

template <typename T, class U>
bool operator==(const AllocatorInterface<T>&, const AllocatorInterface<U>&) throw() {
  return false;
}

template <typename T, class U>
bool operator!=(const AllocatorInterface<T>& alloc1, const AllocatorInterface<U>& alloc2) throw() {
  return !(alloc1 == alloc2);
}

#endif // AllocatorInterfaceH

String is iterable as it uses a Generic Random Access Iterator

C++

typedef GenericRAIterator<StaticallyBufferedStringLight,
                          value_type, false, false> iterator;
typedef GenericRAIterator<StaticallyBufferedStringLight,
                          value_type, true, false> reverse_iterator;

typedef GenericRAIterator<StaticallyBufferedStringLight,
                          value_type, false, true> const_iterator;
typedef GenericRAIterator<StaticallyBufferedStringLight,
                          value_type, true, true> const_reverse_iterator;

but the none-constant irterators should be used very carefully as using them to modify an internal character sequence is a clear and easy way to broke the object state (for example, a simple way to do so is to put a string terminator - '\0' in the middle of a sequence altering so an actual string length) and relevance (of the cached hash), those requiring the class to forcefully restore it's state which is does't directly supported currently.

I provide none-const irterators in order to increase overall flexebility and versatality of the class, through i suppose they should't be used unless the user is clearly undestands what he is doing.

(There is a way to fix that behaviour by providing a proxy object instead of the real char reference by the subscript operator, but i am not sure is It really required).

Most functions of the class is templated to provide compatibility with the other possible storrage types (if they satisfys the requirements).

C++

// Compatible with the ANY storage class which has 'TElemType' as an elem. type
//  AND provides the null-terminated str. by the public const member 'c_str'
//   (like std::string, StaticallyBufferedStringLight<ANY SIZE> etc)
template<typename TStorageType>
StaticallyBufferedStringLight& operator=(const TStorageType& str) throw() {

Custom strCmp function from the MemUtils module used to comapre the strings of characters of a multiple (and a different) types

C++

template<const size_t CmpChunkSize = sizeof(std::uintptr_t), // in bytes
         const bool SignedCmpChunk = false, // as 'strcmp' by default [7.21.4/1 (C99)]
         typename TElem1Type, typename TElem2Type>
typename IntegralTypeBySize<CmpChunkSize, true>::Type
// Used to compare strings with the diff. char types
// Works almost as fast as a 'strcmp' (at least for 'char' AND on release)
// [!] Does NOT checks an incoming args. on validity [!]
  strCmp(const TElem1Type* mem1, const TElem2Type* mem2) throw()
{
  typedef typename IntegralTypeBySize<sizeof(*mem1), false>::Type TUE1T; // unsigned both
  typedef typename IntegralTypeBySize<sizeof(*mem2), false>::Type TUE2T;
  typedef typename IntegralTypeBySize<CmpChunkSize, true>::Type TReturnType;
  // See http://stackoverflow.com/questions/1356741/strcmp-and-signed-unsigned-chars
  static_assert(sizeof(TReturnType) > sizeof(*mem1) &&
                sizeof(TReturnType) > sizeof(*mem2), "Incorrect elem. type size");

  if (mem1 == mem2) return TReturnType(); // optimization
  typename IntegralTypeBySize<CmpChunkSize, SignedCmpChunk>::Type elem1, elem2;
  while (true) {
    elem1 = static_cast<decltype(elem1)>(static_cast<TUE1T>(*mem1));
    elem2 = static_cast<decltype(elem2)>(static_cast<TUE2T>(*mem2));
    
    if (!elem1 || elem1 != elem2)
      return static_cast<TReturnType>(elem1) - elem2;
    ++mem1, ++mem2;
  }
}

There are also stream-like operators defined to concat. the numbers to the string:

C++

// Same as the 'append(const TValueType value)'
template<typename TValueType,
         class = typename // remove from the overload resolution to avoid an ambiguity
           std::enable_if<std::is_arithmetic<TValueType>::value &&
                          !std::is_pointer<TValueType>::value && // 'std::nullptr_t' is OK
                          !std::is_same<TValueType, TElemType>::value>::type>
StaticallyBufferedStringLight& operator<<(const TValueType value) throw() {
  append(value);
  return *this;
}

StaticallyBufferedStringLight& operator<<(const TElemType* const str) throw() {
  append(str);
  return *this;
}

as you can see the SFINAE through std::enable_if and other type traits utilities is involved here to ensure that those operators is used with the correct types only.

Standart namespace was elso enhanced to increase STL compatibility

C++

namespace std {

  template<typename TElemType, const size_t MaxLen>
  bool operator==(const std::string& dStr,
                  const StaticallyBufferedStringLight<TElemType, MaxLen>& sStr) throw() {
    return sStr == dStr;
  }

  template<typename TElemType, const size_t MaxLen>
  bool operator==(const TElemType* const cStr,
                  const StaticallyBufferedStringLight<TElemType, MaxLen>& sStr) throw() {
    return sStr == cStr;
  }
  
  template<typename TElemType, const size_t MaxLen>
  std::ostream& operator<<(std::ostream& stream,
                           const StaticallyBufferedStringLight<TElemType, MaxLen>& str) {
    stream << str.c_str(); // OPTIMIZATION: reserve internal buffer first?
    return stream;
  }
  
  template<typename TElemType, const size_t MaxLen>
  std::istream& operator>>(std::istream& stream,
                           StaticallyBufferedStringLight<TElemType, MaxLen>& str) {
    auto symb = stream.get(); // skip first (it is '\n')
    while (true) {
      symb = stream.get(); // on EOF - 'failbit' flag is set
      switch (symb) {
        case '\n': case '\r': return stream; // escape char.
      }
      if (!stream) break; // true if either 'failbit' or 'badbit' flag is set
      if (!str.push_back(symb)) break;
    }
    return stream;
  }
};

As the str. had a very restricted capacity, there is a specific method presented to determine is the last string altering operation leads to the actual data truncation (this method just returns an internall flag, indicating the truncation event)

C++

// Constrcation, assertion OR appending strs can cause data truncation
//  due to the strictly limited size of the internal buffer
//   in case of the incoming data truncation occures appropriated flag will be set
//    AND this func. will return true
//     data truncation indication flag CAN be reseted by some operations
bool truncated() const throw() {
  return truncated_;
}

Performance tweaks

The memory pool size is auto aligned to the 64-bit word size (as most modern processors is x64):

C++

// Based on the the 'MaxLen', auto adjusted, to have a 8 byte alignment
static const auto BUF_SIZE =
  MaxLen + 1U + (((MaxLen + 1U) % 8U) ? (8U - (MaxLen + 1U) % 8U) : 0U);

(through is's may be better to align it to the natural pointer size - sizeof(std::uintptr_t))

One of the optimization methods here is to provide a specific handling of the specific cases, for example a comparison operator used to comapre the class instance with an array of characters (holding a C str.) can benefit on the known array size

C++

  // 'str' SHOULD contain the POD C str.
  // [!] More efficient then 'operator==(POD C str.)' (coze providing array size),
  //  BUT less effective, then 'operator==(const TStorageType& str)' [!]
  template<const size_t ArrayElemCount>
  bool operator==(const TElemType (&str)[ArrayElemCount]) const throw() {
    if (str == data_) return true;
    // Probably points to the substr. of the str. contatining in 'data_'
    if (checkAddrIfInternal(str)) return false;
    
    if (ArrayElemCount < (length_ + 1U)) return false; // optimization
    
    const auto thisStrMemChunkSize = sizeof(*data_) * (length_ + 1U); // in bytes
    return isEqualMemD<>(data_, str, thisStrMemChunkSize);
  }

(but it looks like it is not work as intended, as, at least in my MS VS 2013 Community update 5, it is simlpy doesn't called).

The comparing here is performed by a custom isEqualMemD function from the MemUtils module

C++

// As 'memcmp' by default (http://www.cplusplus.com/reference/cstring/memcmp/)
template<const bool SignedCmpChunk = false>
// 'memSize' is in bytes
// D - Dispatcher
//  (uses either 64 OR 32 bit chunks, depending on the CPU type, to better fit the register size)
// [!] Faster then 'memCmp<>' OR 'memcmp', use instead of 'isEqualMem' [!]
bool isEqualMemD(const void* const mem1, const void* const mem2, const size_t memSize) throw() {
  typedef const typename IntegralTypeBySize<8U, SignedCmpChunk>::Type T64Bit;
  typedef const typename IntegralTypeBySize<4U, SignedCmpChunk>::Type T32Bit;
  typedef const typename IntegralTypeBySize<1U, SignedCmpChunk>::Type T8Bit;
  
  if (memSize < 8U) {
    return !memcmp(mem1, mem2, memSize);
  }
  switch (CPUInfo::INSTANCE.is64BitCPU) {
    case true: // 64 bit
      assert(memSize >= 8U);
      if (!isEqualMem<8U, SignedCmpChunk>(mem1, mem2, memSize / 8U)) return false;
      // Check the remain (NOT fitted) bytes; sizeof(char) == 1
      return *(reinterpret_cast<T64Bit*>(static_cast<T8Bit*>(mem1) + memSize - 8U)) ==
             *(reinterpret_cast<T64Bit*>(static_cast<T8Bit*>(mem2) + memSize - 8U));
      
    default: // 32 bit
      assert(memSize >= 4U);
      if (!isEqualMem<4U, SignedCmpChunk>(mem1, mem2, memSize / 4U)) return false;
      return *(reinterpret_cast<T32Bit*>(static_cast<T8Bit*>(mem1) + memSize - 4U)) ==
             *(reinterpret_cast<T32Bit*>(static_cast<T8Bit*>(mem2) + memSize - 4U));
  }
}

This function is just a dispatcher, which calling an actual comparing function based on the CPU type (to benefit on the CPU register size)

C++

// [!] Does NOT checks an incoming args. on validity [!]
// Use this when the fact itself of a difference of the two memory chunks is meaningfull,
//  NOT the actual difference value
// [!] Works a bit faster then 'memcmp' [!]
bool isEqualMem(const void* const mem1, const void* const mem2, const size_t iterCount) throw() {
  typedef typename IntegralTypeBySize<CmpChunkSize, SignedCmpChunk>::Type TCmpChunkType;

  if (mem1 == mem2) return true; // optimization
  auto mem1re = static_cast<const TCmpChunkType*>(mem1); // reinterpreted
  auto mem2re = static_cast<const TCmpChunkType*>(mem2);
  
  const auto end = mem1re + iterCount;
  while (mem1re < end) {
    if (*mem1re != *mem2re) return false;
    ++mem1re, ++mem2re;
  }
  return true;
}

A detection of where the current CPU is x32 or x64 is performed by the HardwareUtils module

C++

#ifndef HardwareUtilsH
#define HardwareUtilsH

#ifdef _MSC_VER
  #include <cstring>
  #include <intrin.h> // Microsoft Specific
#elif __GNUC__
  #include <cpuid.h> // GCC
#endif

class CPUInfo {

public:
  // Static singleton (static init. is a thread safe in C++11)
  static const CPUInfo INSTANCE;

  const bool is64BitCPU = false;

private:

  CPUInfo() throw()
    : is64BitCPU(findIs64BitCPU())
  {}

  ~CPUInfo() = default;

  CPUInfo(const CPUInfo&) throw() = delete;
  CPUInfo(CPUInfo&&) throw() = delete;
  CPUInfo& operator=(const CPUInfo&) throw() = delete;
  CPUInfo& operator=(CPUInfo&&) throw() = delete;

  /* Reference:
  http://stackoverflow.com/questions/12212385/detect-if-the-processor-is-64-bit-under-32-bit-os
  https://en.wikipedia.org/wiki/CPUID
  https://en.wikipedia.org/wiki/Long_mode
  https://msdn.microsoft.com/en-us/library/hskdteyh(v=vs.120).aspx
  https://msdn.microsoft.com/en-us/library/windows/desktop/ms684139(v=vs.85).aspx
  http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux
  */
  // HINT: Better rewrite this using an assembly
  //  (see examples: https://en.wikipedia.org/wiki/CPUID#CPUID_usage_from_high-level_languages)
  static bool findIs64BitCPU() throw() {
    static const auto GET_MAX_CMD_SUPPORTED = 0x80000000U; // 2 147 483 648
    
    unsigned int cpuInfo[4U] = {0}; // from EAX, EBX, ECX, and EDX
    #ifdef _MSC_VER
      auto const cpuInfoRe = reinterpret_cast<int*>(cpuInfo); // reinterpreted
      __cpuid(cpuInfoRe, GET_MAX_CMD_SUPPORTED);
      // Max. value of 'function_id' supported for extended functions
      const auto maxFuncIDSupported = *cpuInfo; // at 'EAX'
    #elif __GNUC__
      // 'ext' SHOULD be 0x8000000 to return highest supported value for extended 'cpuid' information
      // Returns 0 if 'cpuid' is not supported or whatever 'cpuid' returns in EAX register
      const auto maxFuncIDSupported = __get_cpuid_max(GET_MAX_CMD_SUPPORTED, nullptr);
    #else
      static_assert(false, "Unsupported complier"); // __BORLANDC__, __MINGW32__
    #endif
    
    // Get Extended Processor Info and Feature Bits
    static const auto GET_EXTENDED_INFO  = 0x80000001U; // 2 147 483 649
    // If does NOT supports extended flags
    if (maxFuncIDSupported < GET_EXTENDED_INFO) return false;
    #ifdef _MSC_VER
      memset(cpuInfo, 0, sizeof(cpuInfo));
      __cpuid(cpuInfoRe, GET_EXTENDED_INFO);
    #elif __GNUC__
      // Checks if 'cpuid' is supported and returns 1 for valid cpuid information or
      //  0 for unsupported cpuid level
      if (!__get_cpuid(GET_EXTENDED_INFO, cpuInfo, cpuInfo + 1U, cpuInfo + 2U, cpuInfo + 3U))
        return false;
    #endif
    
    //' LM' (Long Mode) flag for AMD / 'EM64T' for Intel
    static const auto LONG_MODE_BIT = 536870912U; // 2 pow 29: 29-th bit
    // Check if bit is signaled
    return 0U != (LONG_MODE_BIT & cpuInfo[3U]); // from EDX (cpuInfo[3U])
  }

};

#endif // HardwareUtilsH

As i mention before, string supports hash code genereation (using low-collison getFNV1aHash function from the HashUtils module) which allows to store the string instances in the efficient search data structures with the const. amortized time of an item lookup. But the class also supports a smart hash caching - if stores the known hash code value and returns it (if still relevant) rather then recalcuating it

C++

//// [!] Hashing algo. SHOULD never be changed at the runtime (must be deterministic) [!]

// Uses FNV-1a algorithm (ONLY for the one byte chars - char / unsigned char!!)
size_t hash() const throw() {
  static_assert(1U == sizeof(TElemType), "'TElemType' SHOULD be 1 byte long!");

  if (modified_) { // recalc. rather then returning cached value
    hash_ = MathUtils::getFNV1aHash(data_);
    modified_ = false;
  }
  return hash_;
}

Comparison operator is also benefits on the hash caching: if comparing the two storage instances of some compatible types, which allows the hash code calculation & caching, assuming the hash code is already calculated (with the same hashing algorithm) and still relevant for both instances - if the hash codes is different then the instances are not equal (those making comparison complexity constant rather then linear)

C++

// Compatible with the ANY storage class which has 'TElemType' as an elem. type
//  AND provides the null-terminated str. by the public const member 'c_str'
//   AND returning actual str. len. with the public const member 'length'
//    (like std::string, StaticallyBufferedStringLight<ANY SIZE> etc)
// [!] The most effective type of comparing strs is
//      comparing a one 'StaticallyBufferedStringLight' to another (coze providing hash) [!]
template<class TStorageType>
bool operator==(const TStorageType& str) const throw() {
  if (str.length() != length()) return false;

  // Using hash code. algo. ID to identify if the same algo. is used by the 'str' instance
  // OPTIMIZATION HINT: use C++14 'constexpr' here
  static const auto SAME_HASHING_ = (hashAlgoID() == hashAlgoID::ExecIfPresent(str));
  if (SAME_HASHING_) {
    const auto thisHash = getHashIfKnown(); // ONLY if already calculated
    if (thisHash) { // if this hash is relevant
      // If 'TStorageType' provides hash code
      //  AND hash code is ALREADY calculated (cached) AND relevant
      //   for the BOTH compared objects - if codes does NOT equal ->
      //    objects is clearly diffirent (return false)
      // Called ONLY if exists (if NOT - called surrogate which is ALWAYS return zero)
      const auto otherHash = getHashIfKnown::ExecIfPresent(str);
      // REMEMBER that hash code equivalence does NOT actually means that object are equal
      //  due to the non-nill collison probabilty
      if (otherHash && otherHash != thisHash) return false; // if other hash is known AND relevant
    }
  }
  static const auto SAME_CHAR_TYPE_ = // remove cv and ref.
    std::is_same<typename std::decay<decltype(*c_str())>::type,
                 typename std::decay<decltype(*str.c_str())>::type>::value;
  switch (SAME_CHAR_TYPE_) {
    case true: return isEqualMemD<>(data_, str.c_str(), sizeof(*data_) * length());
    // Diff. types: call 'operator==(const TOtherElemType* const str)'
    default: return *this == str.c_str();
  }
}

hashAlgoID and getHashIfKnown helper functions used here, they are pretty self-explainable

C++

// Returns unique ID of the hashing algo. used to calculate
//  a value returned by the 'hash' AND 'getHashIfKnown' routines
// [!] SHOULD never return zero [!]
// Define AND use this if you planning to compare this instance with the other instances
//  which using the same hash code calculation algorithm [OPTIMIZATION]
// OPTIMIZATION HINT : use C++14 'constexpr' here
static size_t hashAlgoID() throw() {
  static const size_t ID_ = 'F' + 'N' + 'V' + '-' + '1' + 'a';
  return ID_;
}

// NEVER recalculates hash
// Returns zero if actual hash is unknown OR if str. is empty
size_t getHashIfKnown() const throw() {
  return modified_ ? size_t() : hash_;
}

Both of them is actually called using the ExecIfPresent idiom

C++

EXEC_MEMBER_FUNC_IF_PRESENT(getHashIfKnown, size_t())
EXEC_MEMBER_FUNC_IF_PRESENT(hashAlgoID, size_t())

so storage types which is not support this mechanism (like standart std::string) can still be used with this class, those making it more versatile and flexible.

An equality comparison operator can also benefits on the hash code caching if (and only if) proven that the hash code generating algorithm generates a hash value in such manner that the hash of a larger (in terms of the comparision) string is always larger too

C++

// [!] The most effective type of comparing strs is
//      comparing a one 'StaticallyBufferedStringLight' to another (coze providing hash) [!]
// 'TStorageType' SHOULD provide POD C str. with it's 'data' function
template<class TStorageType>
bool operator<(const TStorageType& str) const throw() {
  // Using hash code. algo. ID to identify if the same algo. is used by the 'str' instance
  // OPTIMIZATION HINT: use C++14 'constexpr' here
  static const auto SAME_HASHING_ = (hashAlgoID() == hashAlgoID::ExecIfPresent(str));
  // If proved that the hash code of a larger str. is larger -
  //  we can just check the hash code here [OPTIMIZATION]
  //   (ONLY if the hash code algo. is ALWAYS generates a lager hash for a larger str., FNV-1a is NOT)
  if (SAME_HASHING_ && HashCodeChecker::INSTANCE.hashOfLargerStrLarger) {
    // Get hash ONLY if already known [OPTIMIZATION]
    const auto thisHashCached = getHashIfKnown(), otherHashCached = str.getHashIfKnown();
    if (thisHashCached && otherHashCached && (thisHashCached != otherHashCached)) {
      // Equal caches does NOT necessary means the strs is actually equal,
      //  due to collison probability
      return thisHashCached < otherHashCached;
    }
  }
  static const auto SAME_CHAR_TYPE_ = // remove cv and ref.
    std::is_same<typename std::decay<decltype(*c_str())>::type,
                 typename std::decay<decltype(*str.c_str())>::type>::value;
  switch (SAME_CHAR_TYPE_) {
    case true:
      if (static_cast<const void*>(data_) == static_cast<const void*>(str.data())) return false;
      return 0 > memcmp(data(), str.data(),
                        sizeof(*data_) * (std::min<>(length(), str.length()) + 1U));
    default:
      return *this < str.data(); // diff. types: call 'operator<(const TOtherElemType* const str)'
  }
}

FNV-1a algorithm, through, does not satisfys this requirement as proven by an automated test module <span style="background-color: rgb(255, 255, 255);">HashCodeChecker</span>

C++

#include "StaticallyBufferedStringLight.h"

struct HashCodeChecker::HashChecker {
  size_t operator()(const std::string& str) const throw() {
    if (!COLLECT_STATS && !NEXT_HASH_LARGER_OR_EQ) return 0U; // skip [OPTIMIZATION]
    STR = str;

    const auto currHash = STR.hash();
    const auto nextLargerOrEq = (currHash >= PREV_HASH);
    nextLargerOrEq ? ++STATS[1U] : ++*STATS;
    NEXT_HASH_LARGER_OR_EQ &= nextLargerOrEq;
    PREV_HASH = currHash;

    return currHash;
  }

  static bool COLLECT_STATS; // will be initially false as a static
  // First - next lower count (NOT OK!), second - next larger count (OK); both zero at the start
  static size_t STATS[2U]; // first: NOT larger, second: next larger count
  static size_t PREV_HASH; // will be initially zero as a static
  static bool NEXT_HASH_LARGER_OR_EQ; // initially true!!
  static StaticallyBufferedStringLight<char, 7U> STR;
};

bool HashCodeChecker::HashChecker::COLLECT_STATS;
size_t HashCodeChecker::HashChecker::STATS[2U];
size_t HashCodeChecker::HashChecker::PREV_HASH;
bool HashCodeChecker::HashChecker::NEXT_HASH_LARGER_OR_EQ = true;
decltype(HashCodeChecker::HashChecker::STR) HashCodeChecker::HashChecker::STR;

template <const bool SkipStatistics>
bool HashCodeChecker::ishashOfLargerStrLarger() throw() {
  //// Test A
  static const auto MAX_LEN_ = 55U;
  static_assert('A' + MAX_LEN_ < 128, "Incorrect max. len.");
  StaticallyBufferedStringLight<char, MAX_LEN_> str;

  decltype(str.hash()) prevHash = str.hash(), nowHash = 0U;
  auto currChar = 'A'; // from 'A' to 'A'+MAX_LEN_
  auto nextHashLargerOrEqCount = 0U, nextHashLowerCount = 0U;

  for (auto charIdx = 0U; charIdx < str.max_size(); ++charIdx) {
    str += currChar;
    ++currChar;
    nowHash = str.hash();

    nowHash >= prevHash ? ++nextHashLargerOrEqCount : ++nextHashLowerCount;
    if (SkipStatistics && nextHashLowerCount) return false;
    prevHash = nowHash;
  }
  //// Test B
  char strBuf[4U] = {0};
  HashTester::Params<std::extent<decltype(strBuf)>::value> params(strBuf);
  const auto result = HashTester::createAndTestCharSeq<HashChecker>(params, false);

  return !nextHashLowerCount && result && HashChecker::NEXT_HASH_LARGER_OR_EQ;
}

#ifdef _MSC_VER
  // GCC might NOT build with this, while MS VS 2013 will NOT build withOUT this
  const HashCodeChecker HashCodeChecker::INSTANCE;
#endif

(This moduIe uses the HashTester module).

There is also a specific append method presented which is used to do an optimized (as it skips double buffered copying / temporary object creation and destroying / heap memory usage, unlike standart std::to_string, for example) concatenation of string representation of a number

C++

// Adds a WHOLE number in a str. representation
//  (be carefull here, do NOT miss with the 'operator+=(const TElemType symb)')
template<typename TValueType,
         class = typename // remove from the overload resolution to avoid an ambiguity
           std::enable_if<std::is_arithmetic<TValueType>::value &&
                          !std::is_pointer<TValueType>::value && // 'std::nullptr_t' is OK
                          !std::is_same<TValueType, TElemType>::value>::type>
StaticallyBufferedStringLight& append(const TValueType value) throw() {
  const auto spaceLeft = std::extent<decltype(data_)>::value - length_;
  if (spaceLeft < 2U) { // NO actual space left (1 for the str. terminator)
    truncated_ = true;
    return *this;
  }
  const char* mask = "";
  auto returnFlag = false; // true if return immediately
  auto getMask = [&]() throw() {
    typedef typename TypeTag<decltype(value)>::TypeTags_ Tags;
    switch (TYPE_TAG(value)) {
      // OPTIMIZATION thoughts: reduce the mask count?
      //  use 'if std::is_floatinng<decltype(value)>' for two mask types - float|integral?

      default: assert(false); // unidentified type
      case Tags::NULLPTR: returnFlag = true; break;

      case Tags::BOOL: mask = value ? "true" : "false"; break;

      case Tags::SIGNED_CHAR: mask = "%hhi"; break;
      case Tags::SIGNED_SHORT_INT: mask = "%hi"; break;
      case Tags::SIGNED_INT:
      case Tags::WCHAR: // signedness of wchar_t is unspecified
        mask = "%i"; break;
      case Tags::SIGNED_LONG_INT: mask = "%li"; break;
      case Tags::SIGNED_LONG_LONG_INT: mask = "%lli"; break;

      case Tags::UNSIGNED_CHAR: mask = "%hhu"; break;
      case Tags::UNSIGNED_SHORT_INT: mask = "%hu"; break;
      case Tags::UNSIGNED_INT: mask = "%u"; break;
      case Tags::UNSIGNED_LONG_INT: mask = "%lu"; break;
      case Tags::UNSIGNED_LONG_LONG_INT: mask = "%llu"; break;

      case Tags::FLOAT:
      case Tags::DOUBLE:
        mask = "%f"; break;
      case Tags::LONG_DOUBLE: mask = "%Lf";
    }
  };
  getMask();
  if (returnFlag) return *this;

  #ifdef _MSC_VER // MS VS specific
    // Number of characters stored in buffer, not counting the terminating null character
    auto count = _snprintf_s(data_ + length_, spaceLeft, _TRUNCATE, mask, value);
    if (count < 0) { // if NOT enough space left
      count = spaceLeft - 1U;
      truncated_ = true;
    }
  #else
    // Number of characters that properly SHOULD have been written (except terminating null character)
    auto count = snprintf(data_ + length_, spaceLeft, mask, value);
    if (count < 0) {
      data_[length_] = '\0'; // ensure NO actual changes
      return *this; // encoding error
    }
    if ((count + 1U) > spaceLeft) { // +1 for the str. terminator
      count = spaceLeft - 1U;
      truncated_ = true;
    }
  #endif

  length_ += count;
  modified_ = true;
  return *this;
}

Speed comparison

And now the true test!

(ALL tests performed in the release mode with the ALL optimizations turned on, built x32 in MS VS Community 2013 update 5, runned on my lenovo IdeaPad S400 x64 laptop under x64 Win 8.1)

1) Concatenation speed test

This test performs many (~400 000) smallest (by the one single char) concatenations to the string instance. It is a good case for the std::string (as ony a small percent of concatenations will lead to the situation where length will became larger then the current capacity requiring so to do a reallocation, and that percent will probably even decrease in time, depends on the actual string realization), worst case for the C str. (as it require to determine [with the linear complexity] the string length before doing an actual addition, which results in a time complexity growth in a an arithmetic progression withing each call) and don't-care case for the static str.

std::string:

- best: all the required space is preallocated before the concat.

- worst: addition of a strings whose length is growth in a geometry progression in such a manner that each addition will require to do a reallocation

C str.:

- best: one single addition to the empty string

- worst: many smallest concatenations to the long initial string

Static str.: all cases are equal

Test code:

C++

static const auto STR_BUF_SIZE_ = 440U * 1024U;
CStr<STR_BUF_SIZE_ - 1U> cstr___1_;
StaticallyBufferedStringLight<char, STR_BUF_SIZE_ - 1U> static_str___1_;
std::string dynamic_str___1_;

cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code
const volatile auto cstr_test_time_ =
  TestConcat(cstr___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code

static_str___1_.clear();
std::cout << static_str___1_; // to force the compiler to generate the actual code
const volatile auto static_str_test_time_ =
  TestConcat(static_str___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code

dynamic_str___1_.clear();
std::cout << dynamic_str___1_; // to force the compiler to generate the actual code
const volatile auto dynamic_str_test_time_ =
  TestConcat(dynamic_str___1_, STR_BUF_SIZE_ - 1U).count();
cstr___1_.clear();
std::cout << cstr___1_.strBuf; // to force the compiler to generate the actual code

std::cout << "static str: " << static_str_test_time_ << std::endl;
std::cout << "dynamic str: " << dynamic_str_test_time_ << std::endl;
std::cout << "cstr: " << cstr_test_time_ << std::endl;
std::cout << "\nStatic str.: 1.0, dynamic str.: "
          << static_cast<double>(dynamic_str_test_time_) / static_str_test_time_
          << ", POD C str.: "
          << static_cast<double>(cstr_test_time_) / static_str_test_time_ << std::endl;

C++

template <class TStrType>
auto TestConcat(TStrType& str, const size_t maxLen) throw()
  -> decltype(std::chrono::system_clock::now() - std::chrono::system_clock::now())
{
  static const auto MIN_CHAR_ = 'a';
  static const auto MAX_CHAR_ = 'z';
  static_assert(MIN_CHAR_ < MAX_CHAR_, "Invalid chars");

  std::default_random_engine gen;
  std::uniform_int_distribution<int> distr(MIN_CHAR_, MAX_CHAR_);

  char strToAdd[2U] = {0};
  
  str.clear();
  const auto startTime = std::chrono::high_resolution_clock::now();

  for (size_t charIdx = 0U; charIdx < maxLen; ++charIdx) {
    *strToAdd = distr(gen); // constant complexity
    str += strToAdd; // linear to const complexity
  }
  const auto endTime = std::chrono::high_resolution_clock::now();

  return endTime - startTime;
}

C++

template <const size_t MaxLen>
struct CStr {

  CStr() throw() {
    clear();
  }

  void clear() throw() {
    *strBuf = '\0';
  }

  void operator+=(const char* const str) throw() {
    strcat(strBuf, str);
  }

  char strBuf[MaxLen + 1U];
};

Results:

2) Memory comparison routines speed test

This test involves testing speed of the standart C memcmp / strcmp and my custom comparing functions from the MemUtils module.

Test code:

C++

const auto SIZE_ = 1024U * 1024U * 8U;
auto mem1 = new char[SIZE_];
memset(mem1, 200, SIZE_ - 1U);
auto mem2 = new char[SIZE_];
memset(mem2, 200, SIZE_ - 1U);
mem1[SIZE_ - 1U] = '\0';
mem2[SIZE_ - 1U] = '\0';

// (64 bit OS, 64 bit CPU, 32 bit application)
decltype(std::chrono::system_clock::now()) time1, time2;
static const auto COUNT_001_ = 7U;
static const auto TEST_COUNT_001_ = 40U;
decltype((time2 - time1).count()) counts[COUNT_001_] = {0};
auto memCmp_faster_then_memcmp_count = size_t();
auto memCmp_faster_then_quickCmp = size_t();
unsigned long long avg[COUNT_001_] = {0};
const auto iterCount_1_ = 40U;

for (size_t idx = 0U; idx < iterCount_1_; ++idx) {
  time1 = std::chrono::system_clock::now();
  volatile long long int r0 = 0LL;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    // 'strCmp<>' with 4U works MUCH faster, then with the 8U
    r0 = strCmp<>(mem1, mem2);
  }
  time2 = std::chrono::system_clock::now();
  *counts = (time2 - time1).count();
  *avg += *counts;

  time1 = std::chrono::system_clock::now();
  volatile long long int r1 = 0LL;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    // 'quickCmp' looks like with 8U works faster, then with the 4U
    //  (BUT we can't use 8U AND this func. is NOT used)
    r1 = quickCmp(mem1, mem2, SIZE_);
  }
  time2 = std::chrono::system_clock::now();
  counts[1] = (time2 - time1).count();
  avg[1U] += counts[1];

  time1 = std::chrono::system_clock::now();
  volatile long long int r2 = 0LL;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    r2 = memcmp(mem1, mem2, SIZE_);
  }
  time2 = std::chrono::system_clock::now();
  counts[2] = (time2 - time1).count();
  avg[2U] += counts[2];

  time1 = std::chrono::system_clock::now();
  volatile long long int r3 = 0LL;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    r3 = strcmp(mem1, mem2);
  }
  time2 = std::chrono::system_clock::now();
  counts[3] = (time2 - time1).count();
  avg[3U] += counts[3];

  time1 = std::chrono::system_clock::now();
  volatile long long int r4 = 0LL;
  const auto count_1_ = SIZE_ / sizeof(std::uintptr_t);
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    // 'memCmp' with 8U works faster, then with 4U (BUT we can't do that)
    r4 = memCmp<>(mem1, mem2, count_1_);
  }
  time2 = std::chrono::system_clock::now();
  counts[4] = (time2 - time1).count();
  avg[4U] += counts[4];

  time1 = std::chrono::system_clock::now();
  volatile bool r5 = false;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    r5 = isEqualMemD(mem1, mem2, SIZE_);
  }
  time2 = std::chrono::system_clock::now();
  counts[5] = (time2 - time1).count();
  avg[5U] += counts[5];

  time1 = std::chrono::system_clock::now();
  volatile bool r6 = false;
  const auto count_02_ = SIZE_ / 8U;
  for (size_t testIdx = size_t(); testIdx < TEST_COUNT_001_; ++testIdx) {
    r6 = isEqualMem<8U>(mem1, mem2, count_02_);
  }
  time2 = std::chrono::system_clock::now();
  counts[6] = (time2 - time1).count();
  avg[6U] += counts[6];

  std::cout << "\n";
  std::cout << "count6 - isEqualMem<8U>    : " << counts[6] << "\n";
  std::cout << "count5 - isEqualMemD       : " << counts[5] << "\n";
  std::cout << "count4 - memCmp<>          : " << counts[4] << "\n";
  std::cout << "count3 - strcmp            : " << counts[3] << "\n";
  std::cout << "count2 - memcmp            : " << counts[2] << "\n";
  std::cout << "count1 - quickCmp          : " << counts[1] << "\n";
  std::cout << "count0 - strCmp<>          : " << *counts   << "\n";
  std::cout << "\n" << static_cast<double>(counts[3]) / counts[1] << "\n";
  assert(r1 == r2);

  if (counts[4] < counts[2]) ++memCmp_faster_then_memcmp_count;
  if (counts[4] < counts[1]) ++memCmp_faster_then_quickCmp;
}
std::cout << "\nmemCmp_faster_then_memcmp_count: "
          << memCmp_faster_then_memcmp_count  << " / "
          << iterCount_1_ << std::endl;
std::cout << "memCmp_faster_then_quickCmp: "
          << memCmp_faster_then_quickCmp << " / "
          << iterCount_1_ << std::endl << std::endl;

const char* const names_001_[]
  = {"strCmp<>          ",
     "quickCmp          ",
     "memcmp            ",
     "strcmp            ",
     "memCmp<>          ",
     "isEqualMemD       ",
     "isEqualMem<8U>    "};
auto idx_0001_ = 0U;
for (volatile auto& currAvg : avg) {
  currAvg /= iterCount_1_;
  std::cout << "avg " << names_001_[idx_0001_] << " - #" << idx_0001_ << ": " << currAvg << std::endl;
  ++idx_0001_;
}
delete[] mem1, delete[] mem2;

Results:

Static str. uses best suitable (not always fastest) functions: strCmp<>, isEqualMemD<>, memcmp.

Static str. uses isEqualMemD<> rather then memcmp if possible as it is faster, strCmp<> rather then strcmp as it capable of multiple char types comparing and memcmp rather then strCmp<> if possible as it is faster.

Through this page says that standart memcmp function had an intrinsic form on all architectures and i compiled my test cases with the '/Oi' compiler option, test shows that the custom isEqualMemD<> is slightly faster (which possibly should't be).

3) Relational comparison speed test

Testing 'less than' operators speed.

Test code:

C++

const decltype(s_str_01_) s_str_02_ = static_chars_01_;
const decltype(dstr_01_) dstr_02_ = static_chars_01_;

volatile auto result__01_ = false;
time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
  result__01_ = s_str_01_ < s_str_02_;
}
time2 = std::chrono::system_clock::now();
counts[1] = (time2 - time1).count();

time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
  result__01_ = dstr_01_ < dstr_02_;
}
time2 = std::chrono::system_clock::now();
counts[2] = (time2 - time1).count();

time1 = std::chrono::system_clock::now();
for (size_t testIdx = size_t(); testIdx < 100000U; ++testIdx) {
  result__01_ = strcmp(s_str_01_.c_str(), s_str_02_.c_str()) == 0;
}
time2 = std::chrono::system_clock::now();
counts[3] = (time2 - time1).count();

volatile auto diff = static_cast<double>(counts[2]) / counts[1];
std::cout << "\nStatic str. 'operator<' " << diff << " times faster then the dynamic one\n";
diff = static_cast<double>(counts[3]) / counts[1];
std::cout << "  and " << diff << " times faster then the 'strcmp'\n";

Results:

4) Mass allocation/deallocation speed test

Shows why the concept actually was born :)

Firstly, the PerformanceTester module which used to do the test:

C++

#ifndef PerformanceTesterH
#define PerformanceTesterH

#include <chrono>
#include <iostream>

class PerformanceTester {

public:
  
  struct TestResults {
    typedef decltype(std::chrono::system_clock::now()) TTimePoint;
    typedef decltype((TTimePoint() - TTimePoint()).count()) TTimeCount;

    void clear() throw() {
      time1 = TTimeCount(), time2 = TTimeCount();
    }

    TTimeCount time1 = TTimeCount(), time2 = TTimeCount();
  };

  // Returns diff. (2/1)
  template <class TFunctor1, class TFunctor2, const bool onScreen = true>
  static double test(TFunctor1& subj1, TFunctor2& subj2,
                     const size_t testCount, TestResults& results)
  {
    results.clear();
    if (!testCount) return 0.0;
    
    auto time1 = TestResults::TTimePoint(), time2 = TestResults::TTimePoint();
    auto testIdx = size_t();
    
    auto testSubj = [&](const bool isSecondSubj) throw() {
      time1 = std::chrono::system_clock::now();
      for (testIdx = size_t(); testIdx < testCount; ++testIdx) {
        switch (isSecondSubj) {
          case true: subj2(); break;
          default: subj1(); // false
        }
      }
      time2 = std::chrono::system_clock::now();

      return (time2 - time1).count();
    };

    results.time1 = testSubj(false);
    results.time2 = testSubj(true);
    const auto diff = static_cast<double>(results.time2) / results.time1;
    
    if (onScreen) {
      auto const potfix = (diff < 1.0) ? "faster" : "slower";
      const auto diffReinterpreted = (diff < 1.0) ? (1.0 / diff) : diff;
      std::cout << "\nSecond is " << diffReinterpreted << " times " << potfix << std::endl;
    }
    return diff;
  }
};

#endif // PerformanceTesterH

Test code:

C++

struct Funct1__ {
  volatile int operator()() throw() {
    volatile StaticallyBufferedStringLight<> str;
    return 0;
  }
} funct1__;

struct Funct2__ {
   volatile int operator()() throw() {
    std::string str;
    str.reserve(127U);
    return 0;
  }
} funct2__;

PerformanceTester::TestResults results1;
PerformanceTester::test(funct1__, funct2__, 9999999U, results1);

Results:

5) Mass equality comparison speed test

Testing 'not equal' operators speed in case of the static str. hash code is already calculated.

Test code (diff. symbol towards the end) :

C++

static auto const STR1
  = "cam834mht8xth,a4xh387th,txh87c347837 g387go4 4w78t g34 3g7rgo bvgq7 tgq3874g478g8g oebgbg8 b"
    "cwmc8htcw,o7845mt754cm8w4gcm8w4gc78w4gcw4cw4yc4c4xn34x63gc7sch74s4h5mc7h7cn34xm7xg78gxm7384x";

static auto const STR2
  = "cam834mht8xth,a4xh387th,txh87c347837 g387go4 4w78t g34 3g7rgo bvgq7 tgq3874g478g8g oebgbg8 b"
    "cwmc8htcw,o7845mt754cm8w4gcm8w4gc78w4cw4cw4yc4c4xn34x63gc7sc_74s4h5mc7h7cn34xm7xg78gxm7384x";

struct Funct1__ {

  Funct1__() throw() : str1(STR1), str2(STR2) {
    //// Prehash
    str1.hash();
    str2.hash();
  }

  volatile int operator()() throw() {
    volatile auto result = (str1 != str2);
    return 0;
  }

  StaticallyBufferedStringLight<char, 255U> str1;
  StaticallyBufferedStringLight<char, 255U> str2;
} funct1__;

struct Funct2__ {

  Funct2__() throw() : str1(STR1), str2(STR2) {}

  volatile int operator()() throw() {
    volatile auto result = (str1 != str2);
    return 0;
  }

  std::string str1;
  std::string str2;
} funct2__;

PerformanceTester::TestResults results1;
PerformanceTester::test(funct1__, funct2__, 9999999U, results1);

Results:

Same test with the very first diff. symbol:

Points of Interest

Class using TypeHelpers, GenericRAIterator + CompareUtils, FuncUtils, and MathUtils + HashUtils modules.

This module is just a small part of the library, which uses C++11 features and which I am working under now, I decided to make it a public property.