Introduction
XML files have become very common these days, but there are still many systems out there that require flat-file interfaces (especially in the financial industry). Many, many people have written code many, many times to read a line from the file, pick out the individual fields and perform some action with the retrieved data. Being faced with the prospect of doing this once again, I determined to find a better way of accomplishing this goal in a way that would be generic and reusable.
Summary
TextFieldParser
is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited), and add TextField
members to the strongly-typed TextFieldCollection
. The TextFieldParser
class will extract the desired fields from each line in the target file and is smart enough that if you are processing a delimited file and you specify that a field is quoted, it will ignore any delimiters that it may find within the quoted fields.
Each line in the text file will raise either a RecordFound
or RecordFailed
event, providing details appropriate to the event. If the record is a match, each TextField
member of the TextFieldCollection
will have its Value
property updated to reflect the information parsed from the text file prior to raising the RecordFound
event. If the record does not match the requested pattern, the RecordFailed
event provides the line number and text of the offending record along with an error message and a boolean reference variable that you can use to abort or continue processing. In addition to verifying that each TextField
object has a corresponding field in the text-file's record, the TextFieldParser
will also do an explicit conversion of the text-file's value to the .NET CLR data-type specified in the TextField
object. If there is a conversion error, this is trapped and returned to the caller through the use of the RecordFailed
event.
The Interesting Bits
There is really nothing in this class library that is new or revolutionary. I just took some time to put it together in a way that would be useful to more than one project and more than one person. There are a lot of areas that are covered by this project, however, and they are worth pointing out so that anyone looking for simple examples of such implementations will know what to expect.
- Enumerations
- Delegates & Events
- Overloaded Constructors and Methods
- File Stream I/O
- Strongly-Typed Collections
Usage Example
I haven't included any of the code itself because, quite frankly, there is little there that would require any sort of explanation and all of it is well commented. I have, however, included below an example of how you would instantiate, configure, call and handle call-backs from the TextFieldParser
(included in the demo project is a text file that this code will process).
One thing to note is that the TextField
object has properties for Length
and Quoted
. The Length
property has meaning only if your FileFormat
is FixedWidth
and the Quoted
property has meaning only if the FileFormat
is Delimited
. Otherwise, these values are ignored.
Imports Utilities.Text.Parsing
Module Module1
Dim WithEvents DataFile As New Utilities.Text.Parsing.TextFieldParser
Sub Main()
DataFile.FileType = TextFieldParser.FileFormat.Delimited
DataFile.FileName = "D:\DelimTestFile.txt"
Dim quote As Boolean
For i As Int32 = 1 To 80
quote = (i Mod 2 = 0)
DataFile.TextFields.Add(New TextField("Field" + _
i.ToString(), TypeCode.String, quote))
Next i
DataFile.ParseFile()
DataFile.FileType = TextFieldParser.FileFormat.FixedWidth
DataFile.FileName = "D:\FixedTestFile.txt"
DataFile.TextFields.Clear()
DataFile.TextFields.Add(New TextField("LineNumber", TypeCode.String, 9))
For i As Int32 = 1 To 8
DataFile.TextFields.Add(New TextField("Field1", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field2", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field3", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field4", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field5", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field6", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field7", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field8", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field9", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field10", TypeCode.Int32, 2))
Next i
DataFile.ParseFile()
End Sub
Public Sub RecordFoundHandler(ByRef CurrentLineNumber As Int32, _
ByVal TextFields As TextFieldCollection) _
Handles DataFile.RecordFound
For Each field As TextField In TextFields
Console.WriteLine(field.Name + " = " + CType(field.Value, String))
Next
CurrentLineNumber += 2
End Sub
Public Sub RecordFailedHandler(ByRef CurrentLineNumber As Int32, _
ByVal LineText As String, _
ByVal ErrorMessage As String, _
ByRef Continue As Boolean) _
Handles DataFile.RecordFailed
Console.WriteLine("Num = " + CType(CurrentLineNumber, String) + _
" : Text = " + LineText + " : Msg = " + ErrorMessage)
Continue = True
End Sub
End Module
Areas for Expansion
The most obvious and significant area to improve this class would be to incorporate the ability to pass in an XML stream or file that would populate and configure the TextFieldParser
and TextFieldCollection
automatically. This would allow you to store your configuration information in an external file that is more easily maintained. You can do this yourself, of course, but adding it into the TextFieldParser
class would be ideal.
There may be other areas for improvement as well, but for now it meets my needs. If you happen to make use of this utility and have any improvements to offer, please forward them back to me. I'll be happy to incorporate those that make sense into the solution and update the article.
Update - Feb 15th, 2005
Okay, the more I looked at the issues at hand and spoke to some other developers, the more I decided that the Regular Expression method of text file parsing just wasn't going to ever perform well for wide text files. I have re-written the methods that handle the actual parsing to use more conventional methods. I use a String.Split()
methodology for delimited files and a String.Substring()
methodology for fixed-width files. The external interfaces remain unchanged (with the exception of some defensive argument checking and better error handling). The library still handles quoted strings in delimited files and now allows you to specify the character that you use for your quotes.
The new demo project and source code include two test files that are 80 fields wide. One is fixed-width and the other is delimited with a couple of delimiters embedded inside of quotes for good measure.
As always, please let me know what you think!