Introduction
I was writing a small database application which allows users to import records in plain text files to an Access database. One of the problems I faced was that different users will have different formats: some are tab delimited, some are comma delimited; some have fixed width fields, while others don't. You can use a switch statement to deal with them, but when the number of formats increases, so does the ugliness index of your code. To make it more difficult, more often than not, you don't know what the format will be at coding time.
So, I needed to support customized formats; it has to be flexible enough, yet could be easily understood by the application. It looked like a daunting task, until I came across the idea of Regular Expressions.
Using the Code
It's very simple to use, since there isn't much in it other than the idea of using Regular Expressions. The demo project contains two formats described in formats.xml, and two sample input files. You need to:
- Add flex_format.cs to your project.
- Load the formats information stored in the XML file during initialization.
XmlSerializer s = new XmlSerializer(typeof(ArrayList),
new Type[] { typeof(flex_format) });
TextReader r = new StreamReader("formats.xml");
formats_supported = (ArrayList)s.Deserialize(r);
r.Close();
- Stuff the file filter with formats information when the user opens an
OpenFileDialog
.
OpenFileDialog dlg = new OpenFileDialog();
foreach (flex_format format in formats_supported)
{
dlg.Filter += (dlg.Filter.Length>0?"|":"")+
format.description + "|*" + format.suffix;
}
if (dlg.ShowDialog() == DialogResult.OK)
{
load_list(dlg.FileName);
}
- Parse the text file using the Regular Expression specified in the format description.
private void load_list(string file_name)
{
flex_format format = null;
foreach (flex_format fmt in formats_supported)
{
if (file_name.EndsWith(fmt.suffix))
{
format = fmt;
break;
}
}
listView1.Clear();
foreach(string field_name in format.entries)
{
listView1.Columns.Add(field_name);
}
textBox1.Text = null;
StreamReader reader = new StreamReader(file_name);
while (true)
{
string line = reader.ReadLine();
textBox1.Text += line+"\r\n";
if(line == null || line.Length == 0) break;
Match match = new Regex(format.pattern).Match(line);
if(!match.Success) continue;
ListViewItem item = new ListViewItem(match.Groups[1].Value);
for (int i = 2; i < match.Groups.Count; i++)
{
item.SubItems.Add(match.Groups[i].Value);
}
listView1.Items.Add(item);
}
}
- To add a new format, you can either manually edit the XML file, or programmatically use XML serialization. This is part of the XML file used in the demo project. This format allows the user to use different speed units, which will be much more difficult to implement without the help of Regular Expressions.
<anyType xsi:type="flex_format"
description="Car Speed Record Database"
suffix=".spd"
pattern="([^\t]+)\t([^\t]+)\t([\d\.]+)\s*(mph|km/h|m/s)">
<entries>
<entry>Make</entry>
<entry>Model</entry>
<entry>TopSpeed</entry>
<entry>Unit</entry>
</entries>
</anyType>
Why Regular Expressions
The class flex_format
stores description, file suffix, Regular Expression pattern, and an array of field names in an XML file. How flexible could the file format be? Just as flexible as Regular Expressions. Combined with XML serialization, you have a very concise solution to support flexible text formats. More importantly, you can easily add support of a new format without changing your application.
Revision History
- [July 02 2009] Minor changes to the demo project.
- [June 20 2009] Initial version.