Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Wrapper Class for Parsing Fixed-Width or Delimited Text Files

0.00/5 (No votes)
15 Feb 2005 1  
This is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited) and add TextField members to the strongly-typed TextFieldCollection.

Introduction

XML files have become very common these days, but there are still many systems out there that require flat-file interfaces (especially in the financial industry). Many, many people have written code many, many times to read a line from the file, pick out the individual fields and perform some action with the retrieved data. Being faced with the prospect of doing this once again, I determined to find a better way of accomplishing this goal in a way that would be generic and reusable.

Summary

TextFieldParser is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited), and add TextField members to the strongly-typed TextFieldCollection. The TextFieldParser class will extract the desired fields from each line in the target file and is smart enough that if you are processing a delimited file and you specify that a field is quoted, it will ignore any delimiters that it may find within the quoted fields.

Each line in the text file will raise either a RecordFound or RecordFailed event, providing details appropriate to the event. If the record is a match, each TextField member of the TextFieldCollection will have its Value property updated to reflect the information parsed from the text file prior to raising the RecordFound event. If the record does not match the requested pattern, the RecordFailed event provides the line number and text of the offending record along with an error message and a boolean reference variable that you can use to abort or continue processing. In addition to verifying that each TextField object has a corresponding field in the text-file's record, the TextFieldParser will also do an explicit conversion of the text-file's value to the .NET CLR data-type specified in the TextField object. If there is a conversion error, this is trapped and returned to the caller through the use of the RecordFailed event.

The Interesting Bits

There is really nothing in this class library that is new or revolutionary. I just took some time to put it together in a way that would be useful to more than one project and more than one person. There are a lot of areas that are covered by this project, however, and they are worth pointing out so that anyone looking for simple examples of such implementations will know what to expect.

  • Enumerations
  • Delegates & Events
  • Overloaded Constructors and Methods
  • File Stream I/O
  • Strongly-Typed Collections

Usage Example

I haven't included any of the code itself because, quite frankly, there is little there that would require any sort of explanation and all of it is well commented. I have, however, included below an example of how you would instantiate, configure, call and handle call-backs from the TextFieldParser (included in the demo project is a text file that this code will process).

One thing to note is that the TextField object has properties for Length and Quoted. The Length property has meaning only if your FileFormat is FixedWidth and the Quoted property has meaning only if the FileFormat is Delimited. Otherwise, these values are ignored.

Imports Utilities.Text.Parsing
Module Module1

    Dim WithEvents DataFile As New Utilities.Text.Parsing.TextFieldParser

    Sub Main()

        ' **********************

        ' Delimited File Parsing

        ' **********************


        ' Configure the base object properties

        DataFile.FileType = TextFieldParser.FileFormat.Delimited
        DataFile.FileName = "D:\DelimTestFile.txt"

        ' Add the TextField objects to the collection

        Dim quote As Boolean
        For i As Int32 = 1 To 80
            ' LineOne,"two",three,"four",five,"six" ...

            quote = (i Mod 2 = 0)
            DataFile.TextFields.Add(New TextField("Field" + _
                 i.ToString(), TypeCode.String, quote))
        Next i

        ' Parse the file

        DataFile.ParseFile()

        ' ************************

        ' Fixed Width File Parsing

        ' ************************


        ' Configure the base object properties

        DataFile.FileType = TextFieldParser.FileFormat.FixedWidth
        DataFile.FileName = "D:\FixedTestFile.txt"
        ' get rid of the old field definitions

        DataFile.TextFields.Clear()

        ' Add the TextField objects to the collection

        ' LineOne  onetwothreefourfivesixseveneightnine10 ...

        DataFile.TextFields.Add(New TextField("LineNumber", TypeCode.String, 9))
        For i As Int32 = 1 To 8
            DataFile.TextFields.Add(New TextField("Field1", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field2", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field3", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field4", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field5", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field6", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field7", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field8", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field9", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field10", TypeCode.Int32, 2))
        Next i
        ' Parse the file

        DataFile.ParseFile()
    End Sub

    Public Sub RecordFoundHandler(ByRef CurrentLineNumber As Int32, _
                                  ByVal TextFields As TextFieldCollection) _
                                  Handles DataFile.RecordFound
        ' Do something with the field data for each record successfully matched

        For Each field As TextField In TextFields
            Console.WriteLine(field.Name + " = " + CType(field.Value, String))
        Next
        ' only process every other line in the file

        CurrentLineNumber += 2
    End Sub

    Public Sub RecordFailedHandler(ByRef CurrentLineNumber As Int32, _
                                   ByVal LineText As String, _
                                   ByVal ErrorMessage As String, _
                                   ByRef Continue As Boolean) _
                                   Handles DataFile.RecordFailed
        ' Do something with the field data for each record that fails to match

        Console.WriteLine("Num = " + CType(CurrentLineNumber, String) + _
                " : Text = " + LineText + " : Msg = " + ErrorMessage)
        Continue = True
    End Sub

End Module

Areas for Expansion

The most obvious and significant area to improve this class would be to incorporate the ability to pass in an XML stream or file that would populate and configure the TextFieldParser and TextFieldCollection automatically. This would allow you to store your configuration information in an external file that is more easily maintained. You can do this yourself, of course, but adding it into the TextFieldParser class would be ideal.

There may be other areas for improvement as well, but for now it meets my needs. If you happen to make use of this utility and have any improvements to offer, please forward them back to me. I'll be happy to incorporate those that make sense into the solution and update the article.

Update - Feb 15th, 2005

Okay, the more I looked at the issues at hand and spoke to some other developers, the more I decided that the Regular Expression method of text file parsing just wasn't going to ever perform well for wide text files. I have re-written the methods that handle the actual parsing to use more conventional methods. I use a String.Split() methodology for delimited files and a String.Substring() methodology for fixed-width files. The external interfaces remain unchanged (with the exception of some defensive argument checking and better error handling). The library still handles quoted strings in delimited files and now allows you to specify the character that you use for your quotes.

The new demo project and source code include two test files that are 80 fields wide. One is fixed-width and the other is delimited with a couple of delimiters embedded inside of quotes for good measure.

As always, please let me know what you think!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here