Introduction
NetBase
is a C# project that creates and reads DBF-like files. It supports a tiny subset of SQL to create, insert, and select data from a table. This small project has been an interesting crutch for writing a basic parser. The project also includes a small utility that provides a front-end.
The whole thing is a learning exercise. It's a very slim chance indeed that this will meet any requirements in a production system. However, it might be useful for creating or reading old DBF files, or learning the basics of hand writing a recursive, descent parser. And, it has helped build a little insight into how the query engine of a real RDBMS works.
I have made some attempt to split the project up into some key areas. The lowest layer is in the storage
namespace. Classes in this namespace deal with reading and writing data to disk.The format is based on some old Lotus Approach files.
Classes in the executive
namespace execute queries.The sql
namespace builds queries from SQL statement strings, and the api
namespace wraps everything in the System.Data.Common
interfaces.
This two part series will take a look at the key parts of the code, and it is aimed at beginners:
- Introduction and a look at the DBF file format (the
storage
namespace) - this article - The SQL parser and query builder (the
sql
namespace)
This code is also hosted on github at http://github.com/buttonpusher/NetBase/.
The NetBase File Format
The format of files used by NetBase
is based on the DBF file format. In fact, you should be able to read many types of DBF files by using the classes in the storage
namespace. To create a sample table, you can start up the NetBase.Viewer
application and create a new table in the current working directory by entering the following SQL into the textbox at the top of the form:
CREATE TABLE testtable (first,second,third)
Open up the folder and you should see a file "testtable.dbf". This directly corresponds to the table. To take a look at the file, you should use a hex editor, I recommend frhed.
You can find a description of the DBF format here. NetBase
very closely follows this; however, the only two entries in the header structure that are used are offset 0x8, which contains the offset of the first data record, and offset 0xA, which contains the length of one data record.
The Table Header and Column Definitions
The table header described above is the first 32 bytes of the file, and is read in by the TableHeader.BinRead()
function. Since we know that each column definition will also be 32 bytes, a simple calculation in the TableHeader
class allows us to work out how many columns are in this table:
public int ColumnCount { get { return (this.Size / 32)- 1;} }
The column class contains methods for reading in the columns from the data file, and so the whole section for reading in the header looks like this:
private void Open(string filename)
{
this._fs = new FileStream(filename, FileMode.Open);
this._br = new BinaryReader(this._fs);
this.Header = new TableHeader();
this.Header.BinRead(this._br);
this.Columns = new List<Column>();
for (Int16 i = 0; i < this.Header.ColumnCount; i++)
{
Column c = new Column();
c.BinRead(_br);
this.Columns.Add(c);
}
database._br.ReadByte();
if (this.CalculateTotalRowSize() != this.Header.RowLength)
{
throw new BadDataException("Total size of columns
" +"does not match that defined in the header");
}
}
You can check the DBF file in the hex editor and see the correspondence. The first column starts at offset 0x20 (32), and you will see ASCII characters for "first" (the name of the column) here. The column names have a maximum length of 8 bytes, and these are padded out by hex 0x00 bytes.
Twelve bytes in to each column definition, you will see a char 'C',which indicates that the format of this column is a character field.This is the only data type supported by NetBase
. Although there is a byte reserved to describe the maximum length of a character field, NetBase just assumes a maximum length of 10. So you cannot enter more than 10 characters into each field of your row.
At this point, it is hopefully clear why fixed length column names(in this case, a maximum of eight characters) are common in older database formats - they are much simpler to implement. Modern RDBMSs like SQL Server or MySQL have considerably more sophisticated table headers that allow for variable length column names.
Reading and Writing Rows to the Table
This job is performed by the Row
class. Each instance of a Row
knows to which Table
it belongs; and therefore it is able to obtain information about the layout of the table from the Table.Columns
property.
Because of this, we can validate that any field added to the row has a corresponding column, maintaining some degree of integrity which will be vital when we come to write the row to disk later on:
public object this[string name]
{
get
{
int index = this.Table.Columns.FindIndex(c => c.Name == name);
if (index == -1)throw new ArgumentException("Invalid column name
\"" + name + "\"");
return this[index];
}
set
{
int index = this.Table.Columns.FindIndex(c => c.Name == name);
if (index == -1)throw new ArgumentException("Invalid column name
\"" + name + "\"");
_data[index] = value;
}
}
We can derive how many rows are in the table from the header size, the total size of all column definitions, and the total length of the file. So writing a row is a matter of adding data with the above property, seeking to the right place (for a new row, the end of the file), and calling Row.BinWrite()
which actually puts the row bytes on the disk.
public void AddRow(Row r)
{
if (r.Table != this)
{
throw new InvalidOperationException
("Can't add row - doesn't belong to this table");
}
_fs.Seek(_fs.Length, 0);
BinaryWriter bw = new BinaryWriter(_fs);
r.BinWrite(bw);
this.Header.RowCount += 1;
_fs.Seek(0, 0);
this.Header.BinWrite(bw);
bw.Flush();
}
You can use the NetBase.Viewer
application to add a new row, with the following SQL statement:
INSERT INTO testtable (first,second,third) VALUES (1,2,3)
Once you have added the row, reload the file in your hex editor to see the changes.
Reading a row is also relatively simple, using the Row.BinRead
method to read the row in. Note that since NetBase does not use any indexes, searching rows involves reading every single row and checking whether it matches or not. So, at the current stage of development, there is no need to seek to a particular row "number", and the Table
class only allows seeking to the beginning of the set of rows (the Reset
method) and reading in the next row (the NextRow
method). Perhaps, a later project might be able to replace this inefficient means of selecting rows with an alternative that makes use of b-trees to build indexes.
Of course, the SQL syntax for reading a row is:
SELECT * FROM testtable
and you could also specify particular column names like so:
SELECT first,second FROM testtable
Note that FileTable
contains the "Find
" method, which allows you to retrieve only rows that match a certain criteria. This will be discussed in part 2 when we look at how the SQL parser implements the <codelang="sql"><code>WHERE clause.
Other Points of Interest
One slightly out of place class in the storage
namespace is MemoryTable
. This class is used internally in the database engine when copying data from place to place - for example, when passing the results of a query out into the world.
The storage
namespace is supposed to be relatively modular. So in theory, it should be possible to use alternative on-disk formats, in the same way that MySQL allows both MyISAM and InnoDB formats to be used. However, implementing an alternative storage engine would more than likely reveal some flaws in the architecture of NetBase
that would need to be corrected first.
Next
Part two will discuss the SQL parser, and how SQL queries in the form of strings are understood and processed, to create tables, add rows, and return results.