Introduction
The need for a flexible but powerful data storage within an application grows quickly with the complexity of the application and the size of the data it has to deal with. The common ways to store the data, as to write it to custom binary or text files, using XML files or external databases all have certain disadvantages.
Plain text files, as well as custom binary formats are difficult to navigate, they provide no support for basic CRUD (create, read, update, delete) operations and require a lot of custom-written and maintained code to function.
XML files can be helpful, especially taking into account a good support for CRUD operations on XML that came with LINQ-to-XML. Anyway, reading and writing XML files requires custom code and the operations are not so easy to parallelize without the danger of deadlocking.
External databases provide the best possible support for data storage and retrieval, but run outside of the application context and have to be installed prior to the application installation. This prerequisite is not always easy to fulfill, especially in case of applications that are supposed to run on numerous desktop machines around the world. We don't mean the client applications with one shared central database here, but applications that require each small single database to store its data.
The need for the small but flexible and feature-rich (like SQL support for unified data access) databases has consequently lead to the appearance of embedded databases.
Embedded Database Engines
Quick Comparison of Embedded vs. Server-side Database Engines
Compared to the transitional RDBMS, embedded DB engines (like MS SQL Compact Edition, SQLite, Oracle Berkeley DB are directly "embedded" into the application, meaning that the complete database engine code is packed into a single (or maximally 2) DLLs that are distributed together with your application. So, they run directly in the context of the application and the resources for data access are shared with other parts of the application: available memory, CPU time, disk throughput etc.
General-purpose server-side relational databases (like MS SQL, Oracle, MySQL, PostgreSQL, Firebird, etc.), on the contrary, are installed as an independent service and run outside of the scope of client application. They can perform much better because of the intrinsic support for multi-core and multi-CPU architectures, exploiting the OS features like pre-caching, VSS, etc. to increase the throughput in case of intensive database operation and can claim as much memory as the OS can provide for a single service/application.
Due to the different areas of application server-side and embedded databases have different palettes of options: server-side DBs provide extensive user and right management, support for views and stored procedures, whereas embedded database normally lack any support for users and rights management and have limited support for views and stored procedures (latter ones lose the majority of their benefits of running on server side). Data throughput is a usual bottlenecks of RDBMS, server versions are usually installed on striped RAID volumes, whereas embedded DB are often memory-oriented (try to keep all the actual data in the memory) and minimize the data storage access operations.
In this article, we focus exclusively on the traditional relational databases and intentionally do not consider other types of DBs. It is worth mentioning that in the latter time especially NoSQL databases became very popular. Today, they can be seen as noble rivals to the traditional relational database storage and thus deserve a special article.
To the Term "Embedded"
Reading about embedded databases, one should not mistakenly think that these database are only supposed to be used in applications targeting the embedded systems, like portable devices running Windows CE or Embedded ARM/Tiny OS. Embedded databases are common in all types of desktops and even some types of server applications: the support for concurrency that both SQLite and SQL CE 4.0 have makes them a popular database of choice for small- and middle-sized web applications. A relatively fresh IDE for web development brought to the market by Microsoft WebMatrix used SQL CE 4.0 as a default database for all its projects and templates.
Same time, the compactness and low performance requirements make the embedded databases also an attractive solution for data storage on mobile devices: the upcoming release of Windows Phone OS "Mango" features SQL CE as the local database. For the previous version of Windows Phone, there is an adapted SQLite version available.
Support for .NET
A quick look to the Wikipedia page about embedded databases will reveal many names that do not appear in this article. This is due to the fact that many database engines still lack good support for .NET managed code or have a very limited usage in the .NET world [1].
Lacking direct support for .NET can also be observed in case of SQLite. Fortunately, due to the high popularity of this library in the unmanaged world, there exist several implementations of ADO.NET bindings for this database engine. We have just arbitrarily taken the System.Data.SQLite data provider in this article that provides the implementation of DbConnection class from ADO.NET and enables us to provide all benchmarking operation in a data engine abstract way. At the same time, we are aware that the test results obtained using this data provider cannot be directly blamed (or acclaimed) upon the database engine only and should definitely include the data provider as well.
Benchmarking the Performance of Database Engines
Key Operations to Benchmark
Performance benchmarking is a very common task in database world. Several independent ranking systems were developed for this purpose: TPC, SPEC, PolePosition, etc. You will find a longer list with some commentaries on this wiki page from Oracle). These benchmarks target primarily the server-side databases and include scenarios that are common for large processing systems: POS systems of large shopping malls, stock and commodity exchange houses and others. These scenarios are very untypical for embedded databases and can hardly be used for benchmarking those.
For the applications that are not directly database centered, e.g. that are not thought to be just thick clients for the stored data, but have other primarily goals and use database just to store application state and user specific data [2], the following would be most common database operations:
- Inserting single data entries into the table
- Random reading access to the table rows
- Random writing (updating) access to the table rows
- Random row deletions
We implemented every type of access as a Action<dbconnection>
generic delegate to abstract from the concrete database connection. See the Listing.
Performance Counters
In case of the server-side databases running as separate instances performance indicators are more or less independent from the client application, but largely depend upon your hardware. Embedded engines share all resources with the host application and having computation-intensive threads running in parallel with the data access thread(s) might lead to dramatic decrease of your database performance.
Anyway, for desktop applications, the primary goal is usually to provide interactivity with the user. This implies that the application should respond to the user input within a certain amount of time and the maximal possible delay here is usually the most critical point. Even though modern user application architecture requires the data operations to be executed in a worker thread, long execution time for these types of threads impedes the user to continue his or her actions, despite the seeming responsiveness of GUI.
Thus, time needed for every database operation and series of operation is the most important and crucial parameter for embedded database systems. In our simple approach, this is the only metric we are going to use [3].
Testbench Application
The testbench application is written in C# (project type: console application) and provides certain abstraction above the database engine used for benchmarking. All database operations are implemented as delegates taking DbConnection
object as the input parameter. This object implements the database connectivity in the most abstract way.
Due to subtle syntax deviations between SQL CE and SQLite, we had to use two different SQL strings for table creation. The sample size -- the number of queries run against the database on every step -- is a fixed constant value NUMBER_OF_TESTS
. Guid
s are used to generate pseudo-random string data that is inserted or updated.
Since the SQLite.DLL is a mixed DLL containing both the unmanaged binary code of the original SQLite.dll library written in C and the managed bindings for this library, we have to enable the support for mixed assemblies before we can load this assembly in the application domain of .NET 4.0. For this purpose, just add an application configuration file to your console application and add some options into the configuration
section. The complete content of the App.config
file is shown below.
="1.0"="utf-8"
<configuration>
<startup useLegacyV2RuntimeActivationPolicy="true">
<supportedRuntime version="v4.0"/>
</startup>
</configuration>
Here is the source code of the application:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SQLite;
using System.Data.SqlServerCe;
using System.Data.Common;
namespace TestSQL
{
class Program
{
const int NUMBER_OF_TESTS = 200;
private static string create_table;
private static string create_table_sqlce = "CREATE TABLE Test
( id integer not null identity primary key, textdata nvarchar(500));";
private static string create_table_sqlite = "CREATE TABLE Test
( id integer not null primary key, textdata nvarchar(500));";
private static string drop_table = "DROP TABLE Test";
private static string insert_data = "INSERT INTO Test
(textdata) VALUES ('{0}');";
private static string read_data = "SELECT textdata FROM Test WHERE id = {0}";
private static string update_data = "UPDATE Test SET textdata =
'{1}' WHERE id = {0}";
private static string delete_data = "DELETE FROM Test WHERE id = {0}";
static Action<dbconnection> ACreateTable = (a) => CreateTable(a);
static Action<dbconnection> ATestWrite = (a) => TestWrite(a, NUMBER_OF_TESTS);
static Action<dbconnection> ATestRead = (a) => TestRead(a, NUMBER_OF_TESTS);
static Action<dbconnection> ATestUpdate = (a) => TestUpdate(a, NUMBER_OF_TESTS);
static Action<dbconnection> ATestDelete = (a) => TestDelete(a, NUMBER_OF_TESTS);
static Action<dbconnection> ADropTable = (a) => DropTable(a);
static Func<action<dbconnection>,DbConnection, TimeSpan>
MeasureExecTime = (a,b) => { var start = DateTime.Now; a(b);
var finish = DateTime.Now; return finish - start; };
static Action<string,> AMeasureAndOutput = (a, b) =>
Console.WriteLine(a, b.TotalMilliseconds);
static void Main(string[] args)
{
SQLiteConnection.CreateFile("sqlite.db");
SQLiteConnection sqliteconnect =
new SQLiteConnection("Data Source=sqlite.db");
SqlCeConnection sqlceconnect =
new SqlCeConnection("Data Source=sqlce.sdf");
sqlceconnect.Open();
sqliteconnect.Open();
Console.WriteLine("=Testing CRUD performance of embedded DBs=");
Console.WriteLine(" => Samplesize: {0}", NUMBER_OF_TESTS);
create_table = create_table_sqlite;
Console.WriteLine("==Testing SQLite==");
DoMeasures(sqliteconnect);
create_table = create_table_sqlce;
Console.WriteLine("==Testing SQL CE 4.0==");
DoMeasures(sqlceconnect);
sqlceconnect.Close();
sqliteconnect.Close();
Console.ReadKey();
}
static void DoMeasures(DbConnection con)
{
AMeasureAndOutput("Creating table: {0} ms",
MeasureExecTime(ACreateTable, con));
AMeasureAndOutput("Writing data: {0} ms", MeasureExecTime(ATestWrite, con));
AMeasureAndOutput("Updating data: {0} ms",
MeasureExecTime(ATestUpdate, con));
AMeasureAndOutput("Reading data: {0} ms", MeasureExecTime(ATestRead, con));
AMeasureAndOutput("Deleting data: {0} ms",
MeasureExecTime(ATestDelete, con));
AMeasureAndOutput("Dropping table: {0} ms",
MeasureExecTime(ADropTable, con));
}
static void CreateTable(DbConnection con)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = create_table;
sqlcmd.ExecuteNonQuery();
}
static void TestWrite(DbConnection con, int num)
{
for (; num-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText =
string.Format(insert_data,Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
}
static void TestRead(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = num; max-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(read_data, rnd.Next(1,num-1));
sqlcmd.ExecuteNonQuery();
}
}
static void TestUpdate(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
for (var max = num; max-- > 0; )
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format
(update_data, rnd.Next(1, num - 1), Guid.NewGuid().ToString());
sqlcmd.ExecuteNonQuery();
}
}
static void TestDelete(DbConnection con, int num)
{
Random rnd = new Random(DateTime.Now.Millisecond);
var order = Enumerable.Range(1, num).ToArray<int>();
Action<int[],> swap = (arr, a, b) => { int c = arr[a];
arr[a] = arr[b]; arr[b] = c; };
for (var max=num; max-- > 0; ) swap(order, rnd.Next(0, num - 1),
rnd.Next(0, num - 1));
foreach(int index in order)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = string.Format(delete_data, index);
sqlcmd.ExecuteNonQuery();
}
}
static void DropTable(DbConnection con)
{
var sqlcmd = con.CreateCommand();
sqlcmd.CommandText = drop_table;
sqlcmd.ExecuteNonQuery();
}
}
}
Performing Benchmarking
Some Information about the Tests
In order to have any reference for time values, here is some information about the benchmarking environment:
- The following hardware was used for tests: Dell Precision WorkStation T7400 equipped with 2 Intel Xeon E5420 CPUs and 8GB of RAM, running 64bit Win7 Enterprise.
- Both engines were used with their default settings, connection string looked like "Data Source=database_file_name".
- The latest versions of both SQL CE 4.0 and SQLite/System.Data.SQLite (from June 3rd, 2011) were used.
Here are the Results for Two Different Samples
=Testing CRUD performance of embedded DBs=
=> Samplesize: 200
==Testing SQLite==
Creating table: 396.0396 ms
Writing data: 22189.2187 ms
Updating data: 23591.3589 ms
Reading data: 21.0021 ms
Deleting data: 20963.0961 ms
Dropping table: 85.0085 ms
==Testing SQL CE 4.0==
Creating table: 16.0016 ms
Writing data: 25.0025 ms
Updating data: 56.0056 ms
Reading data: 28.0028 ms
Deleting data: 53.0053 ms
Dropping table: 11.0011 ms
... and a bigger sample:
=Testing CRUD performance of embedded DBs=
=> Samplesize: 1000
==Testing SQLite==
Creating table: 93.0093 ms
Writing data: 116632.6621 ms
Updating data: 104967.4957 ms
Reading data: 134.0134 ms
Deleting data: 107666.7656 ms
Dropping table: 83.0083 ms
==Testing SQL CE 4.0==
Creating table: 16.0016 ms
Writing data: 128.0128 ms
Updating data: 307.0307 ms
Reading data: 164.0164 ms
Deleting data: 306.0306 ms
Dropping table: 13.0013 ms
Results
Having run the benchmarking application 4 times, we changed the order of the databases and ran it 4 more times. The accumulated statistics are shown in the table below:
Databases SQLite 3.6.23.1 SQL CE 4.0
Values (ms) Mean Stderr Mean Stderr
Creating table 109.4 9.579144012 18.4 0.92736185
Writing data 22549.8 218.5306386 25.8 0.583095189
Updating data 21290.8 638.9963537 56.8 0.489897949
Reading data 20.6 0.244948974 28.4 0.678232998
Deleting data 20032.8 323.019566 52.8 0.860232527
Dropping table 127.4 52.21168452 11.0 0.316227766
Now we can use Excel to produce a nice column diagram for our data. Please, note that we have to use logarithmic scale to represent the data that fluctuates over several orders of magnitude on one diagram.
Click here for the full size image.
Discussion
So, as you can see, any writing operations (create
, update
, delete
) require almost 1000x more time in SQLite compared to SQLCE, whereas reading the data seems to be at the same performance level for both embedded databas.
This dramatic difference is not so difficult to explain and it does not necessarily reflect the general bad performance of SQLite as a database engine. One should consider the following details that might have affected the benchmarking:
- The data provider we used for SQLite is the
System.Data.SQLite
, that is a mixed assembly containing both managed and unmanaged code. SQLite is originally written completely in C and the DLL only provides bindings. This means it has to make platform invokes to call the respective procedures from the native SQLite.DLL and also convert the data here and back. Probably these P/Invokes and data marshaling eats up a good piece of the operation time. - Both data engines were tested with their default settings! Most likely, SQLCE 4.0 caches all the data in memory by default, whereas SQLite flushes most of the data changes directly to the disk storage every time the change happens. One can supply hundreds of parameters for both databases via connection string and tune them appropriately and force data engine to behave the way that probably delivers better results [4].
- We used a series of single queries to test the DB. At least SQLCE supports bulk operations via special .NET classes that would be better suited here. If SQLite supports them too, it would be nice to compare them as well.
Generally, we observed certain problems with SQLite
/System.Data.SQLite
bundle on x64 machines (using the same .NET adapter): from data connection being closed unexpectedly to database file corruption. We presume there is some stability problems either with the data adapter or with the library itself.
Conclusion
If you need a performant embedded database engine that is performant with default settings, provides concurrent access and works just "out-of-the-box" for your .NET application both on x86 and x64 platforms, you should probably have a look at Microsoft SQL CE 4.0.
Endnotes
[1] Should you disagree and think that certain database engine is worth including into this article, please don't hesitate to contact me with your suggestion, the author will be glad to add some more diversity into the article!
[2] As an example of a non datacentric application, you can consider Microsoft Word. As a text editor, it requires some storage capabilities, but it's primary goal is not just to edit the stored data, but to provide you with specific tools for document editing.
[3] In the future version of the article, we might consider using memory consumption as another very important metrics. For the sake of simplicity, this metric is skipped for now.
[4] The author will be glad to accept any suggestions for the engine tuning and promises to update the article if the results look much different.
History
- July 3rd, 2011 -- Initial version