NetBase - A Minimal .NET Database with a Small SQL Parser, Part 1

buttonpusher

4.21/5 (7 votes)

3 Nov 2009LGPL36 min read

44.8K

1.3K

This is a small database system that reads and writes to DBF-like files, with a SQL front-end.

Download source code - 66.94 KB

Introduction

NetBase is a C# project that creates and reads DBF-like files. It supports a tiny subset of SQL to create, insert, and select data from a table. This small project has been an interesting crutch for writing a basic parser. The project also includes a small utility that provides a front-end.

The whole thing is a learning exercise. It's a very slim chance indeed that this will meet any requirements in a production system. However, it might be useful for creating or reading old DBF files, or learning the basics of hand writing a recursive, descent parser. And, it has helped build a little insight into how the query engine of a real RDBMS works.

I have made some attempt to split the project up into some key areas. The lowest layer is in the storage namespace. Classes in this namespace deal with reading and writing data to disk.The format is based on some old Lotus Approach files.

Classes in the executive namespace execute queries.The sql namespace builds queries from SQL statement strings, and the api namespace wraps everything in the System.Data.Common interfaces.

This two part series will take a look at the key parts of the code, and it is aimed at beginners:

Introduction and a look at the DBF file format (the storage namespace) - this article
The SQL parser and query builder (the sql namespace)

This code is also hosted on github at http://github.com/buttonpusher/NetBase/.

The NetBase File Format

The format of files used by NetBase is based on the DBF file format. In fact, you should be able to read many types of DBF files by using the classes in the storage namespace. To create a sample table, you can start up the NetBase.Viewer application and create a new table in the current working directory by entering the following SQL into the textbox at the top of the form:

SQL

CREATE TABLE testtable (first,second,third)

Open up the folder and you should see a file "testtable.dbf". This directly corresponds to the table. To take a look at the file, you should use a hex editor, I recommend frhed.

You can find a description of the DBF format here. NetBase very closely follows this; however, the only two entries in the header structure that are used are offset 0x8, which contains the offset of the first data record, and offset 0xA, which contains the length of one data record.

The Table Header and Column Definitions

The table header described above is the first 32 bytes of the file, and is read in by the TableHeader.BinRead() function. Since we know that each column definition will also be 32 bytes, a simple calculation in the TableHeader class allows us to work out how many columns are in this table:

public int ColumnCount { get { return (this.Size / 32)- 1;} }

The column class contains methods for reading in the columns from the data file, and so the whole section for reading in the header looks like this:

private void Open(string filename)
{
    this._fs = new FileStream(filename, FileMode.Open);
    this._br = new BinaryReader(this._fs); 
    
    this.Header = new TableHeader();
    this.Header.BinRead(this._br); 
    
    this.Columns = new List<Column>();
    for (Int16 i = 0; i < this.Header.ColumnCount; i++)
    {
        Column c = new Column();
        c.BinRead(_br);
        this.Columns.Add(c);
    } 
    // Read mysterious extra byte; found in original test
    // table from a Lotus Approach 
    database._br.ReadByte();
    // Sanity check row size
    if (this.CalculateTotalRowSize() != this.Header.RowLength)
    {
        throw new BadDataException("Total size of columns 
            " +"does not match that defined in the header");
    }
}

You can check the DBF file in the hex editor and see the correspondence. The first column starts at offset 0x20 (32), and you will see ASCII characters for "first" (the name of the column) here. The column names have a maximum length of 8 bytes, and these are padded out by hex 0x00 bytes.

Twelve bytes in to each column definition, you will see a char 'C',which indicates that the format of this column is a character field.This is the only data type supported by NetBase. Although there is a byte reserved to describe the maximum length of a character field, NetBase just assumes a maximum length of 10. So you cannot enter more than 10 characters into each field of your row.

At this point, it is hopefully clear why fixed length column names(in this case, a maximum of eight characters) are common in older database formats - they are much simpler to implement. Modern RDBMSs like SQL Server or MySQL have considerably more sophisticated table headers that allow for variable length column names.

Reading and Writing Rows to the Table

This job is performed by the Row class. Each instance of a Row knows to which Table it belongs; and therefore it is able to obtain information about the layout of the table from the Table.Columns property.

Because of this, we can validate that any field added to the row has a corresponding column, maintaining some degree of integrity which will be vital when we come to write the row to disk later on:

public object this[string name]
{
    get
    {
        int index = this.Table.Columns.FindIndex(c => c.Name == name);
        if (index == -1)throw new ArgumentException("Invalid column name
        	\"" + name + "\"");
            return this[index];
    }
    set
    {
        int index = this.Table.Columns.FindIndex(c => c.Name == name);
        if (index == -1)throw new ArgumentException("Invalid column name
        	\"" + name + "\"");
        _data[index] = value;
    }
}

We can derive how many rows are in the table from the header size, the total size of all column definitions, and the total length of the file. So writing a row is a matter of adding data with the above property, seeking to the right place (for a new row, the end of the file), and calling Row.BinWrite() which actually puts the row bytes on the disk.

public void AddRow(Row r)
{
    if (r.Table != this)
    {
        throw new InvalidOperationException
    	("Can't add row - doesn't belong to this table");
    } 

    _fs.Seek(_fs.Length, 0);
    BinaryWriter bw = new BinaryWriter(_fs);
    r.BinWrite(bw); 

    this.Header.RowCount += 1;
    _fs.Seek(0, 0);
    this.Header.BinWrite(bw); 

    bw.Flush();
}

You can use the NetBase.Viewer application to add a new row, with the following SQL statement:

SQL

INSERT INTO testtable (first,second,third) VALUES (1,2,3)

Once you have added the row, reload the file in your hex editor to see the changes.

Reading a row is also relatively simple, using the Row.BinRead method to read the row in. Note that since NetBase does not use any indexes, searching rows involves reading every single row and checking whether it matches or not. So, at the current stage of development, there is no need to seek to a particular row "number", and the Table class only allows seeking to the beginning of the set of rows (the Reset method) and reading in the next row (the NextRow method). Perhaps, a later project might be able to replace this inefficient means of selecting rows with an alternative that makes use of b-trees to build indexes.

Of course, the SQL syntax for reading a row is:

SQL

SELECT * FROM testtable

and you could also specify particular column names like so:

SQL

SELECT first,second FROM testtable

Note that FileTable contains the "Find" method, which allows you to retrieve only rows that match a certain criteria. This will be discussed in part 2 when we look at how the SQL parser implements the <codelang="sql"><code>WHERE clause.

Other Points of Interest

One slightly out of place class in the storage namespace is MemoryTable. This class is used internally in the database engine when copying data from place to place - for example, when passing the results of a query out into the world.

The storage namespace is supposed to be relatively modular. So in theory, it should be possible to use alternative on-disk formats, in the same way that MySQL allows both MyISAM and InnoDB formats to be used. However, implementing an alternative storage engine would more than likely reveal some flaws in the architecture of NetBase that would need to be corrected first.

Part two will discuss the SQL parser, and how SQL queries in the form of strings are understood and processed, to create tables, add rows, and return results.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)