The provided C++ classes makes it easy to work with the Extensible Storage Engine (ESE), the NoSQL database engine that is included with Windows.
Introduction
Related articles:
The C API for the Extensible Storage Engine is a bit daunting, and this is probably why this excellent database engine is not used as much as it probably would be if the API were more accessible.
The main reasons for using ESE is that it:
- Is installed with Windows, so it gets updates and bugfixes along with the rest of the OS. Chances are that you are already using it every day since it is used by key Microsoft technologies such as:
- Microsoft Exchange
- Active Directory
- Windows Search
- It is very reliable.
- It is fairly fast – inserting more than 100 000 records per second is not unusual.
In this article, I will demonstrate how to:
- Initialize the database engine and create a session
- Create a database
- Open an existing database
- Create a table
- Open an existing table
- Add columns to table
- Create a primary index for a table
- Insert rows into a table
- Find a row in a table
- Update a row in a table
- Use transactions
- Select a range of rows from a table based on search criteria
- Iterate over a table, or a range of rows, forwards and backwards
The test code, in EseTests.cpp, which is included with the source code, provide examples demonstrating how to store and retrieve all datatypes supported by ESE.
There is also an example, provided with the download, that demonstrates the performance of ESE:
- Inserts 44 640 000 rows in just over 5 minutes: 140 160 rows per second.
- Updates 44 640 000 rows in just over 5 minutes: 138 427 rows per second.
- Searches for, and retrieves, rows at a rate of more than 900 000 rows per second.
While the above are excellent numbers, I have not found a way to make ESE excel at deleting rows from a table.
To see how this works out on our own system, build and execute:
ExampleCoreESE01.exe -c -r -t F:\<Path to database directory>\Database.edb
where
- -c tells the program to create a new database
- -r tells the program to replace any existing database at that location
- -t tells the program to execute the performance tests
This requires about 10GB of free space on your harddrive.
The library allows you to use both Unicode, UTF16, and char based strings by providing overloads that use the Unicode version and the ANSI version of the ESE C API, and selects one or the other based on the string/character type of the arguments were this is applicable.
Several years ago, Artour Bakiev wrote an excellent article: Extensible Storage Engine, that demonstrates how to work with the ESE C API – it also demonstrates the amount of effort required just to get started with ESE using the API that is provided with the Windows SDK.
What I would like is a simple and convenient C++ API that makes it easy to access the features of the engine, without hiding any of the capabilities. Creating an instance of the engine, starting a session, and creating, or opening, a database is a common pattern – it should be simple:
BOOST_AUTO_TEST_CASE( InstanceInitializationTest )
{
Ese::Instance instance( "TestInstance" );
auto session = instance.BeginSession( );
auto database = session.CreateDatabase( DatabasePath,
Ese::CreateDatabaseFlags::OverwriteExisting );
}
If anything goes wrong, the library will throw an exception, so there is no need for additional error checking – and now we have all that is required to work with an ESE database. Those exceptions must be handled somewhere though.
A fair amount of the library is documented using XML documentation tags understood by Visual Studio and Doxygen, making it easy to figure out what a function does:
The boost C++ libraries, https://www.boost.org/, are required to build and run the code. The provided Visual Studio projects expect the environment variable BOOST_ROOT
to point to the directory where you unpacked the distribution, and that the library files can be found in $(BOOST_ROOT)\stage\lib.
Harlinn::Common::Core::Ese
To use the library, just #include <HCCEse.h>
, and link with the Harlinn.Common.Core.lib
library.
It is usually sufficient to add the directory containing Harlinn.Common.Core.lib
library to the Additional Library Directories entry under Properties: Linker->General for your C++ project in Visual Studio.
There are four major classes in the library that wraps the ESE C API handle types. These classes are move assignable and move constructible, but not copy assignable, and not copy constructible. This design is used to ensure that the lifetime of the handles is convenient to manage in an appropriate way, ensuring that the handles are always closed when the owning objects go out of scope.
Instance
Instance is the root object, and all use of the library should start by creating an instance of this type.
The Instance
object holds and manages the lifetime of an ESE instance handle.
Session
A Session
object holds and manages the lifetime of an ESE session handle. Sessions provide the transactional context of ESE, and all ESE database operations are performed through a session.
When an instance is used by more than one thread, each thread must have its own session.
Database
A Database
object holds and manages the lifetime of a database handle. ESE database handles are used to manage the schema of the database and is used to manage tables inside the database. Database handles can only be used with the session that was used to create them, and the library helps to facilitate this.
Table
A Table
object holds and manages the lifetime of a database cursor handle. ESE cursor handles are used to read row data, search rows; or create, update and delete rows. It is also used to define the columns and indexes for a table. Like database handles, the database cursor handles can only be used with the session that was used to create them, and the library helps to facilitate this too.
Concepts
The implementation of the library uses templates, and many of those are constrained using a small number of concepts. I find that using concepts makes the code significantly more readable than std::enable_if<…>, and the rest of the SFINAE based “hacks” that we previously had to deal with.
The DirectType concept is used to help the compiler choose the right template for datatypes that do not require any modification when reading and writing to ESE:
template<typename T>
concept DirectType = ( ( std::is_integral_v<T> && !std::is_same_v<bool, T> ) ||
std::is_floating_point_v<T> ||
std::is_same_v<TimeSpan, T> ||
std::is_same_v<Currency, T> ||
std::is_same_v<Guid, T> );
This concept is used to select templates for datatypes that can be passed directly on to the ESE C API, and using it eliminates the clutter caused by using std::enable_if<…>
template<DirectType T>
void SetColumn( JET_COLUMNID columnid, const T& value, SetFlags flags = SetFlags::None ) const
{
auto rc = SetColumn( columnid, &value, sizeof( std::decay_t<T> ), flags, nullptr );
RequireSuccess( rc );
}
First steps
Normally, an application will have a single Ese::Instance
object, and multiple sessions with one or more open databases in them, but to keep the code for this article simple, I bundle them into a single class called Engine
:
class Engine
{
public:
Ese::Instance instance;
Ese::Session session;
Ese::Database database;
Engine( bool createNewDatabase = true )
{
instance = Ese::Instance( "TestInstance" );
instance.SetCreatePathIfNotExist( );
instance.SetExceptionAction( Ese::ExceptionAction::None );
instance.SetSystemPath( DatabaseSystemPath );
instance.SetLogFilePath( DatabaseLogfilePath );
session = instance.BeginSession( );
if ( createNewDatabase )
{
database =
session.CreateDatabase( DatabasePath,
Ese::CreateDatabaseFlags::OverwriteExisting );
}
else
{
session.AttachDatabase( DatabasePath );
database = session.OpenDatabase( DatabasePath );
}
}
};
If createNewDatabase
is false
, the Engine
class will open an existing database.
The call to instance.SetCreatePathIfNotExist( )
tells the database engine to create any required directory that is missing in a file system silently, without requiring any further action by the developer.
instance.SetExceptionAction( Ese::ExceptionAction::None )
is used to tell the database engine to not display a dialog box in case of error.
instance.SetSystemPath( DatabaseSystemPath )
is used to set the path to the directory that will contain the checkpoint file for the instance.
instance.SetLogFilePath( DatabaseLogfilePath )
is used to set the path to the directory that will contain the transaction logs for the instance.
These are options that would normally come from a configuration file, or a key in the registry.
There are a large set of parameters that can be set for the database engine on a global, instance, and session level. Providing a type-safe API to these, and documenting that API is an ongoing effort. A function that retrieves a parameter is named QueryParameterName
, and the function for setting a parameter is named SetParameterName
.
The Engine
class is not a part of the library; its purpose is to simplify writing the test cases. With this in place, creating a table is as easy as:
BOOST_AUTO_TEST_CASE( CreateTableTest1 )
{
Engine engine;
auto& database = engine.database;
auto table1 = database.CreateTable( "table1" );
BOOST_TEST( table1.IsValid( ) );
}
The implementation of CreateTable(...)
looks deceptively simple, but it is template function that allows you to specify your own implementation of Table
, if needed.
template<TableType T = Table, CharType C>
[[nodiscard]] T CreateTable(const C* tablename,
unsigned long initialNumberOfPages = 1,
unsigned long density = 0) const
{
JET_TABLEID tableId = 0;
auto rc = CreateTable( tablename, initialNumberOfPages, density, &tableId );
RequireSuccess( rc );
T result( sessionId_, tableId );
result.OnTableCreated( );
return result;
}
As you see, we still have full access to everything that JetCreateTableA
offers, but normally, we are happy with the defaults.
Now that we have a table, it is time to add a column:
auto columnId = table1.AddText( "fieldName" );
The above creates a variable sized text column, with max size set to 127 characters. This means that even if we store UTF16 encoded text, it can participate in an index. Inserting a row requires just a few steps:
std::string ValueToInsert( "Text to store" );
table1.Insert( );
table1.SetColumn( columnId, ValueToInsert );
table1.Update( );
table1.Insert( )
prepares the cursor for an insert of a new record, initializing all the columns to the default state. If the table has an auto-increment column, then a new value is assigned to this record regardless of whether the update completes.
table1.SetColumn(…)
has suitable overloads for all the datatypes supported by the library.
To be able to write 0 length strings to the database, SetColumn
adds the flag SetFlags::ZeroLength
to the flags
parameter:
template<StringType T>
void SetColumn( JET_COLUMNID columnId, const T& text, SetFlags flags = SetFlags::None ) const
{
using CharT = typename T::value_type;
DWORD length = static_cast<unsigned long>( text.length( ) * sizeof( CharT ) );
if ( !length )
{
flags |= SetFlags::ZeroLength;
}
auto rc = SetColumn( columnId, text.c_str( ), length, flags, nullptr );
RequireSuccess( rc );
}
Without this flag, specifying a length of 0 tells the database engine to assign NULL
to the column.
The C++ template provides an implementation that works for both std::string
and std::wstring
.
The library provides a separate function:
table1.SetNull( columnId );
to assign NULL
to a column.
Now that we have written a value to the database, we can read it back using:
auto value1 = table1.As<std::string>( columnId );
As<…>(columnId)
offers a convenient way to both read the column value and to determine whether a NULL
value was stored, as it returns a std::optional<T>
.
template<typename T>
std::optional<T> As( JET_COLUMNID columnId, RetrieveFlags flags = RetrieveFlags::None ) const
{
T data;
if ( Read( columnId, data, flags ) )
{
return std::optional<T>( std::move( data ) );
}
else
{
return std::optional<T>( );
}
}
If value1.has_value( )
returns false
, then the column was NULL
. The other convenient alternative is to use table1.Read(…)
which returns false
if the column is NULL
. There are Read(…)
overloads for all the datatypes supported by the library.
template<DirectType T>
bool Read( JET_COLUMNID columnId, T& value, RetrieveFlags retrieveFlags = RetrieveFlags::None ) const
{
constexpr unsigned long DataSize = sizeof( std::decay_t<T> );
unsigned long actualDataSize;
auto rc = RetrieveColumn( columnId, &value, DataSize, &actualDataSize, retrieveFlags, nullptr );
return CheckReadResult( rc );
}
By utilizing a C++ template, the above implementation of Read(…)
allows us to retrieve the data for the following column types: char
, signed char
, unsigned char
, short
, unsigned short
, long
, unsigned long
, long long
, unsigned long long
, float
, double
, TimeSpan
, Currency
and Guid
.
The library extends the number of datatypes, that can be used with ESE, by attempting to map the C++ type to the nearest “sensible” ESE supported type. TimeSpan
is a type that is quite similar to the .Net TimeSpan
value type, where a period of time is measured in ticks. The library maps the TimeSpan
type to a long long
. DateTime
, which also stores data internally in a long long
, gets converted to and from a double since this is the native ESE type for datetime. The library supports the following datatypes for use with ESE:
bool
char
/signed char
/unsigned char
short
/unsigned short
long
/unsigned long
long long
/unsigned long long
float
double
Currency
DateTime
TimeSpan
Guid
std::string
/std::wstring
std::vector<char>
/std::vector<signed char>
/std::vector<unsigned char>
Core::IO::MemoryStream
Whether a column is a text or a long text column, is determined by its max size when creating the column, and it is transparent if it is one, or the other, when reading and writing to the database. Binary and long binary columns are treated similarly. You use the following functions when defining columns for a table:
AddBoolean
AddSByte
AddByte
AddInt16
AddUInt16
AddInt32
AddUInt32
AddInt64
AddUInt64
AddSingle
AddDouble
AddCurrency
AddTimeSpan
AddDateTime
AddGuid
AddText
AddBinary
These are small wrapper functions that let code completion feature in Visual Studio work for you, so you do not have to consult the documentation all the time. Code completion is also supported by providing enum class
types that group together related defines from the ESE C API.
It is time for something a little bit more complex. Let us say we want to store metering data for sensors identified by a Guid
, and that we have the following structure for holding the data:
struct SensorValue
{
Guid Sensor;
DateTime Timestamp;
Int64 Flags = 0;
Double Value = 0.0;
};
In the spirit of keeping things simple, we create a new class SensorEngine
derived from Engine
:
class SensorEngine : public Engine
{
public:
...
};
The full source code for the example is provided in EseTests.cpp, and to run the example code execute:
Harlinn.Common.Core.Tests.exe --run_test=EseTests/InsertSearchAndUpdateSensorValueTableTest1
from the command line.
SessionEngine
has a few variables:
class SensorEngine : public Engine
{
public:
...
JET_COLUMNID SensorColumnId;
JET_COLUMNID TimestampColumnId;
JET_COLUMNID FlagsColumnId;
JET_COLUMNID ValueColumnId;
Ese::Table SensorValues;
...
};
The various JET_COLUMNID
members will hold the column ids for the four columns in the table, and the Ese::Table SensorValues
holds the handle to the database cursor for the table.
Creating the table is simple:
void CreateSensorValueTable( )
{
session.BeginTransaction( );
SensorValues = database.CreateTable( SensorValueTableName );
SensorColumnId = SensorValues.AddGuid( SensorColumnName );
TimestampColumnId = SensorValues.AddDateTime( TimestampColumnName );
FlagsColumnId = SensorValues.AddUInt64( FlagsColumnName );
ValueColumnId = SensorValues.AddDouble( ValueColumnName );
SensorValues.CreateIndex( SensorValueIndexName, Ese::IndexFlags::Primary, L"+S\0+T\0", 6 );
session.CommitTransaction( );
}
The call to SensorValues.CreateIndex(…)
is used to create a primary index for the table. The third parameter specifies the fields and the order of the index. A ‘+’ indicates ascending order, while a ‘-’ would indicate descending order. ‘S’ is the name of the Guid column for the sensor id, and ‘T’ is the name of the DateTime
column for the Timestamp
member of the SensorValue
struct. Each part of the key specification is terminated by a ‘\0’, and the implicit ‘\0’ generated by the compiler at the end of the string provides the two consecutive ‘\0’ values required to terminate the specification. It is also good practice to execute data definition operations in their own transaction.
To open an already existing table, you do something like this:
void OpenSensorValueTable( )
{
SensorValues = database.OpenTable( SensorValueTableName );
SensorColumnId = SensorValues.GetColumnId( SensorColumnName );
TimestampColumnId = SensorValues.GetColumnId( TimestampColumnName );
FlagsColumnId = SensorValues.GetColumnId( FlagsColumnName );
ValueColumnId = SensorValues.GetColumnId( ValueColumnName );
SensorValues.SetCurrentIndex( SensorValueIndexName );
}
Now that we have a table, we need to insert and update rows in that table:
void Insert( const SensorValue& value )
{
SensorValues.Insert( );
SensorValues.SetColumn( SensorColumnId, value.Sensor );
SensorValues.SetColumn( TimestampColumnId, value.Timestamp );
SensorValues.SetColumn( FlagsColumnId, value.Flags );
SensorValues.SetColumn( ValueColumnId, value.Value );
SensorValues.Update( );
}
void Update( const SensorValue& value )
{
if ( MoveTo( value.Sensor, value.Timestamp ) )
{
SensorValues.Replace( );
SensorValues.SetColumn( FlagsColumnId, value.Flags );
SensorValues.SetColumn( ValueColumnId, value.Value );
SensorValues.Update( );
}
}
We have already been through the mechanics of inserting rows into a table in ESE, and updating rows is fairly simple too. SensorValues.Replace( )
prepares the cursor for an update that will not modify the columns that are part of the primary key.
Then we set the column values, before telling the database engine that we are done modifying the record.
This implementation if MoveTo(…)
finds an existing row with a timestamp less or equal to the argument timestamp
for the sensor identified by sensorId
:
bool MoveTo( const Guid& sensorId, const DateTime& timestamp ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.MakeKey( timestamp );
auto rc = SensorValues.Seek( Ese::SeekFlags::LessOrEqual );
return rc >= Ese::Result::Success;
}
By passing the Ese::KeyFlags::NewKey
flag we tell the engine that we are creating a new search key for the database cursor. Consecutive calls MakeKey(…)
without this flags adds fields to the current search key in the order specified by the key string used to create the current index.
Calling SensorValues.Seek(…)
with the Ese::SeekFlags::LessOrEqual
tells the engine to place the cursor on a row that is required to match for at least one field, while the column data for the remaining fields can compare less or equal. In this case, this we have a function that finds a record with a timestamp matching the argument, or the record with the maximum timestamp less than the argument.
There is also an another overload of MoveTo(…)
:
bool MoveTo( const Guid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
auto rc = SensorValues.Seek( Ese::SeekFlags::GreaterOrEqual );
return rc >= Ese::Result::Success;
}
that positions the cursor on the first row for the sensor identified by sensorId
. This can be used to iterate over all the rows stored for that sensorId
:
if ( engine.MoveTo( firstSensor ) )
{
size_t rowCount = 0;
double sum = 0.0;
do
{
auto sensorId = engine.Sensor( );
if ( sensorId != firstSensor )
{
break;
}
sum += engine.Value( );
rowCount++;
} while ( sensorValues.MoveNext( ) );
printf( "MoveTo - Count: %zu, sum: %f\n", rowCount, sum );
}
In this case, we must check that the cursor is not moving beyond the data stored for the requested sensor. A better way to achieve this would be to set up a filter on the primary key index:
bool Filter( const Guid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
auto rc = SensorValues.Seek( Ese::SeekFlags::GreaterOrEqual );
if ( rc >= Ese::Result::Success )
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.MakeKey( DateTime::MaxValue() );
SensorValues.SetIndexRange( Ese::IndexRengeFlags::UpperLimit );
return true;
}
else
{
return false;
}
}
The implementation starts out identically to the implementation of the previous MoveTo(…)
function, we just need to set up a new search key containing:
- The sensor id
- A value greater than, or equal to, the maximum value that can possible be stored in the
timestamp
column.
and then call SensorValues.SetIndexRange(…)
telling the database engine that the current key now represents the upper limit of the range we want to iterate over:
if ( engine.Filter( firstSensor ) )
{
size_t rowCount = 0;
double sum = 0.0;
do
{
sum += engine.Value( );
rowCount++;
} while ( sensorValues.MoveNext( ) );
printf( "Filter - Count: %zu, sum: %f\n", rowCount, sum );
}
Moving the database cursor for the table to the last entry for a sensor is also pretty easy:
bool MoveToLast( const Guid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.MakeKey( DateTime::MaxValue() );
auto rc = SensorValues.Seek( Ese::SeekFlags::LessOrEqual );
return rc >= Ese::Result::Success;
}
The logic for this search key is the same as for the second search key created for the Filter(…)
function above. Now we can move backwards through the rows for the sensor:
if ( engine.MoveToLast( secondSensor ) )
{
size_t rowCount = 0;
double sum = 0.0;
do
{
auto sensorId = engine.Sensor( );
if ( sensorId != secondSensor )
{
break;
}
sum += engine.Value( );
rowCount++;
} while ( sensorValues.MovePrevious( ) );
printf( "MoveToLast - Count: %zu, sum: %f\n", rowCount, sum );
}
We still need to check that we do not pass the beginning of the range, and like earlier we can set up a filter:
bool ReverseFilter( const Guid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.MakeKey( DateTime::MaxValue( ) );
auto rc = SensorValues.Seek( Ese::SeekFlags::LessOrEqual );
if ( rc >= Ese::Result::Success )
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.SetIndexRange( );
return true;
}
else
{
return false;
}
}
Using this filter simplifies the reverse loop significantly:
if ( engine.ReverseFilter( secondSensor ) )
{
size_t rowCount = 0;
double sum = 0.0;
do
{
sum += engine.Value( );
rowCount++;
} while ( sensorValues.MovePrevious( ) );
printf( "ReverseFilter - Count: %zu, sum: %f\n", rowCount, sum );
}
Now I hope I have convinced you to try out the Extensible Storage Engine for your next C++ project requiring a fast and reliable storage engine. I have used ESE for several projects and found that it is really robust. While debugging, I have killed the process, perhaps thousands of times, in the middle of an update/transaction – and the database has not once been corrupted by this.
In production, I have solutions that have stored more than 10 TB of data in ESE, and the engine is able to handle this efficiently and reliably.
Compared to other NoSQL engines, I really appreciate that ESE allows me to create multiple indexes on a table, as this is something you often have to implement yourself using simple the key/value API provided by other storage engines.
History
- 28th of August, 2020 - Initial posting.
- 31st of August, 2020 - code cleanup, using concepts to eliminate sevaral requires clauses.
- 6th of October, 2020 - Bug fixes, cleaned up most of the unit tests.
- 7th of October, 2020 - More unit tests for the Harlinn.Common.Core library.