(untagged)

Load a DBF into a DataTable

Brian Duke

0.00/5 (No votes)

11 Mar 2008

Load a DBF into a DataTable without using the Jet or other OLE Db drivers

In this post, you will find a simple class to read an entire DBF into a DataTable. From there, you can use .NET to manipulate the data. You will not find any method to query the data; this loads the entire file and that's it. You probably don't want to use this if you have a DBF with tens of thousands of records or more.

Download source files - 8.8 KB

Introduction

The main focus of this article is to load the contents of a DBF file into a DataTable. Sure, you can search the Internet and find a thousand examples of how to load a DBF in .NET. However, try to find one that does not use the MS Jet driver or perhaps the MS FoxPro driver. I tried myself and couldn't find one, so I decided to write a class to do it.

Of course, you may be asking yourself, "why would I want to do that?" In my particular case, I tried using the Jet and FoxPro drivers, and found that they stumbled when trying to load a date field from a DBF. It could be that the DBF I was using was not formatted to spec. I really couldn't say, and it doesn't matter since I don't have any control over the format; I simply had to find a way to read it.

What I have provided here is a simple class to read an entire DBF into a DataTable. From there, you can use .NET to manipulate the data. What I have not provided is any method to query the data; this loads the entire file and that's it. You probably don't want to use this if you have a DBF with tens of thousands of records or more. Mine has about 2500 records, and I know it would never be much more than that.

Background

There are two concepts I present in this article. The first is, obviously, the structure of a DBF file and how to load it. The second is a method to load the headers in a DBF directly into a structure.

This class also doesn't verify that the file is in fact a DBF. Where I am using this class, I am certain that the input is a DBF file. If you are not certain, you'll need to find a way to verify the file is a DBF.

I should also mention that I do not profess to be an expert on dBase DBF files. There are also some field types that I have probably left out of my reader. If you feel I am not taking something into account here that I should, or want to send me some info on missing field types, I will update my code.

Using the Code

A dBase DBF file is a fairly simple file format that is one of the oldest PC file formats around for storing record based information. Because of its simplicity, it has become somewhat of a generic file format that many applications can read and write.

A normal DBF consists of a main header, followed by column headers, followed by one or more records. To keep the code simple, we are using some tricks that allow us to read the headers directly into .NET structures. By tricks, I simply mean using some less common attributes and methods.

Let's take a look at the structures we'll use to represent the headers in the DBF.

// This is the file header for a DBF. We do this special layout with everything
// packed so we can read straight from disk into the structure to populate it
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
private struct DBFHeader
{
    public byte version;
    public byte updateYear;
    public byte updateMonth;
    public byte updateDay;
    public Int32 numRecords;
    public Int16 headerLen;
    public Int16 recordLen;
    public Int16 reserved1;
    public byte incompleteTrans;
    public byte encryptionFlag;
    public Int32 reserved2;
    public Int64 reserved3;
    public byte MDX;
    public byte language;
    public Int16 reserved4;
}

// This is the field descriptor structure. 
// There will be one of these for each column in the table.
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
private struct FieldDescriptor
{
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 11)]
    public string fieldName;
    public char fieldType;
    public Int32 address;
    public byte fieldLen;
    public byte count;
    public Int16 reserved1;
    public byte workArea;
    public Int16 reserved2;
    public byte flag;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 7)]
    public byte[] reserved3;
    public byte indexFlag;
}

When you define a structure type in C#, the CLR will organize it in memory in whatever way it thinks is most efficient. This is not very conducive to reading in the data straight from disk. To get around this, we add the StructLayout attribute with a parameter of Pack = 1. This tells the CLR that this structure should be aligned in memory exactly as we specify it such that an Int32 only uses 4 bytes, an Int16 only uses 2 bytes, etc.

If you did not specify this, you would end up with things like an Int16 using 4 bytes, and then the data in your file would not line up with the structure in memory.

The other attribute you'll notice is the MarshalAs statement. This is because .NET does not normally support fixed length strings, but this is exactly what we need. This header has 11 bytes reserved for the fieldname. We need to make sure that 11 bytes in our structure is also reserved for the fieldname. The same for the 7 bytes of reserved data.

Once you have the structures defined exactly as they appear in the file, you can read them in using something like this:

// Read the header into a buffer
br = new BinaryReader(File.OpenRead(dbfFile));
byte[] buffer = br.ReadBytes(Marshal.SizeOf(typeof(DBFHeader)));

// Marshall the header into a DBFHeader structure
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
DBFHeader header = (DBFHeader) Marshal.PtrToStructure(
                    handle.AddrOfPinnedObject(), typeof(DBFHeader));
handle.Free();

First, we read the header into a buffer. .NET doesn't really want us using pointers, but there is some support for something similar that allows us to load the structure directly from a buffer. We have to get a handle to the buffer that is marked as Pinned so the garbage collector won't move it around on us. Once we do that, we can use the PtrToStructure method to copy the data in the buffer directly into our structure. Don't forget to free/unpin the buffer when you're done!

Now, we want to read in all the column descriptors. Each record represents one column in the DBF table, and a 13 marks the end of the headers. We'll use the same method to read them directly into a structure.

// Read in all the field descriptors. 
// Per the spec, 13 (0D) marks the end of the field descriptors
ArrayList fields = new ArrayList();
while ((13 != br.PeekChar()))
{
    buffer = br.ReadBytes(Marshal.SizeOf(typeof(FieldDescriptor)));
    handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
    fields.Add((FieldDescriptor) Marshal.PtrToStructure(
                handle.AddrOfPinnedObject(), typeof(FieldDescriptor)));
    handle.Free();
}

Once we've read all the headers in, then we'll create our DataTable columns. My DBF only has these four types in it. If you have other types, you'll need to add them here.

DataColumn col = null;
foreach (FieldDescriptor field in fields)
{
    switch (field.fieldType)
    {
    case 'N':
        col = new DataColumn(field.fieldName, typeof(Int32));
        break;
    case 'C':
        col = new DataColumn(field.fieldName, typeof(string));
        break;
    case 'D':
        col = new DataColumn(field.fieldName, typeof(DateTime));
        break;
    case 'L':
        col = new DataColumn(field.fieldName, typeof(bool));
        break;
    }
    dt.Columns.Add(col);
}

Finally, we will read in each of the records using the column format information. The main header tells us how many records there are and how long each record is. It appears that the length of a record may not exactly match what you'd expect by summing the column definitions. Therefore, we use the record length field of the header to read the entire record into a buffer, then we read each component out of the buffer into each column of a new DataRow.

The date parsing is the reason I wrote this class in the first place. I'm not sure if other DBFs all work this way, but it seems the one I had to load stores the date in a text format of YYYYMMDD, with a null date represented as 1900 1 1. I default the field to DBNull and check the year to see if I need to parse the date. I wrap the parse in a try block so that if it can't parse, it doesn't break the entire load. I wrote this in .NET 1.1 so anyone could load it, but in .NET 2.0 and later, there is a TryParse member of Int32 that you could use to test the strings.

// Read in all the records
for (int counter = 0; counter <= header.numRecords - 1; counter++)
{
    // First, we'll read the entire record into a buffer and then read each 
    // field from the buffer. This helps account for any extra space at the 
    // end of each record and probably performs better.
    buffer = br.ReadBytes(header.recordLen);
    recReader = new BinaryReader(new MemoryStream(buffer));

    // Loop through each field in a record
    row = dt.NewRow();
    foreach (FieldDescriptor field in fields)
    {
        switch (field.fieldType)
        {
            case 'D': // Date (YYYYMMDD)
                year = Encoding.ASCII.GetString(recReader.ReadBytes(4));
                month = Encoding.ASCII.GetString(recReader.ReadBytes(2));
                day = Encoding.ASCII.GetString(recReader.ReadBytes(2));
                row[field.fieldName] = System.DBNull.Value;
                try
                {
                    if ((Int32.Parse(year) > 1900))
                    {
                        row[field.fieldName] = new DateTime(Int32.Parse(year), 
                                                   Int32.Parse(month), Int32.Parse(day));
                    }
                }
                catch
                {}

                break;

            ...

        }
    }

    recReader.Close();
    dt.Rows.Add(row);
}

Conclusion

I hope that this proves useful for anyone else that may have had a problem with the Jet driver or perhaps just doesn't want to distribute MDAC with their application (MDAC 2.6 is required by .NET to use the OleDB Jet driver, but isn't included with Windows 2000). If you find any mistakes, please let me know and I will update my sample code.

History

10^th March, 2008: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here