In this post, you will find a simple class to read an entire DBF into a DataTable. From there, you can use .NET to manipulate the data. You will not find any method to query the data; this loads the entire file and that's it. You probably don't want to use this if you have a DBF with tens of thousands of records or more.
Introduction
The main focus of this article is to load the contents of a DBF file into a DataTable
. Sure, you can search the Internet and find a thousand examples of how to load a DBF in .NET. However, try to find one that does not use the MS Jet driver or perhaps the MS FoxPro driver. I tried myself and couldn't find one, so I decided to write a class to do it.
Of course, you may be asking yourself, "why would I want to do that?" In my particular case, I tried using the Jet and FoxPro drivers, and found that they stumbled when trying to load a date field from a DBF. It could be that the DBF I was using was not formatted to spec. I really couldn't say, and it doesn't matter since I don't have any control over the format; I simply had to find a way to read it.
What I have provided here is a simple class to read an entire DBF into a DataTable
. From there, you can use .NET to manipulate the data. What I have not provided is any method to query the data; this loads the entire file and that's it. You probably don't want to use this if you have a DBF with tens of thousands of records or more. Mine has about 2500 records, and I know it would never be much more than that.
Background
There are two concepts I present in this article. The first is, obviously, the structure of a DBF file and how to load it. The second is a method to load the headers in a DBF directly into a structure.
This class also doesn't verify that the file is in fact a DBF. Where I am using this class, I am certain that the input is a DBF file. If you are not certain, you'll need to find a way to verify the file is a DBF.
I should also mention that I do not profess to be an expert on dBase DBF files. There are also some field types that I have probably left out of my reader. If you feel I am not taking something into account here that I should, or want to send me some info on missing field types, I will update my code.
Using the Code
A dBase DBF file is a fairly simple file format that is one of the oldest PC file formats around for storing record based information. Because of its simplicity, it has become somewhat of a generic file format that many applications can read and write.
A normal DBF consists of a main header, followed by column headers, followed by one or more records. To keep the code simple, we are using some tricks that allow us to read the headers directly into .NET structures. By tricks, I simply mean using some less common attributes and methods.
Let's take a look at the structures we'll use to represent the headers in the DBF.
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
private struct DBFHeader
{
public byte version;
public byte updateYear;
public byte updateMonth;
public byte updateDay;
public Int32 numRecords;
public Int16 headerLen;
public Int16 recordLen;
public Int16 reserved1;
public byte incompleteTrans;
public byte encryptionFlag;
public Int32 reserved2;
public Int64 reserved3;
public byte MDX;
public byte language;
public Int16 reserved4;
}
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
private struct FieldDescriptor
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 11)]
public string fieldName;
public char fieldType;
public Int32 address;
public byte fieldLen;
public byte count;
public Int16 reserved1;
public byte workArea;
public Int16 reserved2;
public byte flag;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 7)]
public byte[] reserved3;
public byte indexFlag;
}
When you define a structure type in C#, the CLR will organize it in memory in whatever way it thinks is most efficient. This is not very conducive to reading in the data straight from disk. To get around this, we add the StructLayout
attribute with a parameter of Pack = 1
. This tells the CLR that this structure should be aligned in memory exactly as we specify it such that an Int32
only uses 4 bytes, an Int16
only uses 2 bytes, etc.
If you did not specify this, you would end up with things like an Int16
using 4 bytes, and then the data in your file would not line up with the structure in memory.
The other attribute you'll notice is the MarshalAs
statement. This is because .NET does not normally support fixed length strings, but this is exactly what we need. This header has 11 bytes reserved for the fieldname. We need to make sure that 11 bytes in our structure is also reserved for the fieldname. The same for the 7 bytes of reserved data.
Once you have the structures defined exactly as they appear in the file, you can read them in using something like this:
br = new BinaryReader(File.OpenRead(dbfFile));
byte[] buffer = br.ReadBytes(Marshal.SizeOf(typeof(DBFHeader)));
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
DBFHeader header = (DBFHeader) Marshal.PtrToStructure(
handle.AddrOfPinnedObject(), typeof(DBFHeader));
handle.Free();
First, we read the header into a buffer. .NET doesn't really want us using pointers, but there is some support for something similar that allows us to load the structure directly from a buffer. We have to get a handle to the buffer that is marked as Pinned
so the garbage collector won't move it around on us. Once we do that, we can use the PtrToStructure
method to copy the data in the buffer directly into our structure. Don't forget to free/unpin the buffer when you're done!
Now, we want to read in all the column descriptors. Each record represents one column in the DBF table, and a 13 marks the end of the headers. We'll use the same method to read them directly into a structure.
ArrayList fields = new ArrayList();
while ((13 != br.PeekChar()))
{
buffer = br.ReadBytes(Marshal.SizeOf(typeof(FieldDescriptor)));
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
fields.Add((FieldDescriptor) Marshal.PtrToStructure(
handle.AddrOfPinnedObject(), typeof(FieldDescriptor)));
handle.Free();
}
Once we've read all the headers in, then we'll create our DataTable
columns. My DBF only has these four types in it. If you have other types, you'll need to add them here.
DataColumn col = null;
foreach (FieldDescriptor field in fields)
{
switch (field.fieldType)
{
case 'N':
col = new DataColumn(field.fieldName, typeof(Int32));
break;
case 'C':
col = new DataColumn(field.fieldName, typeof(string));
break;
case 'D':
col = new DataColumn(field.fieldName, typeof(DateTime));
break;
case 'L':
col = new DataColumn(field.fieldName, typeof(bool));
break;
}
dt.Columns.Add(col);
}
Finally, we will read in each of the records using the column format information. The main header tells us how many records there are and how long each record is. It appears that the length of a record may not exactly match what you'd expect by summing the column definitions. Therefore, we use the record length field of the header to read the entire record into a buffer, then we read each component out of the buffer into each column of a new DataRow
.
The date parsing is the reason I wrote this class in the first place. I'm not sure if other DBFs all work this way, but it seems the one I had to load stores the date in a text format of YYYYMMDD, with a null date represented as 1900 1 1. I default the field to DBNull
and check the year to see if I need to parse the date. I wrap the parse in a try
block so that if it can't parse, it doesn't break the entire load. I wrote this in .NET 1.1 so anyone could load it, but in .NET 2.0 and later, there is a TryParse
member of Int32
that you could use to test the string
s.
for (int counter = 0; counter <= header.numRecords - 1; counter++)
{
buffer = br.ReadBytes(header.recordLen);
recReader = new BinaryReader(new MemoryStream(buffer));
row = dt.NewRow();
foreach (FieldDescriptor field in fields)
{
switch (field.fieldType)
{
case 'D':
year = Encoding.ASCII.GetString(recReader.ReadBytes(4));
month = Encoding.ASCII.GetString(recReader.ReadBytes(2));
day = Encoding.ASCII.GetString(recReader.ReadBytes(2));
row[field.fieldName] = System.DBNull.Value;
try
{
if ((Int32.Parse(year) > 1900))
{
row[field.fieldName] = new DateTime(Int32.Parse(year),
Int32.Parse(month), Int32.Parse(day));
}
}
catch
{}
break;
...
}
}
recReader.Close();
dt.Rows.Add(row);
}
Conclusion
I hope that this proves useful for anyone else that may have had a problem with the Jet driver or perhaps just doesn't want to distribute MDAC with their application (MDAC 2.6 is required by .NET to use the OleDB Jet driver, but isn't included with Windows 2000). If you find any mistakes, please let me know and I will update my sample code.
History
- 10th March, 2008: Initial version