Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

LINQ recipe: Split a fixed-width row into a string array

2.57/5 (3 votes)
16 Oct 2009CPOL4 min read 35K  
Need to process a fixed-width file? LINQ makes it easy!

Introduction

Using LINQ, we can easily split rows in a fixed-width flat file into a string array. This code is very easy to adapt to return strongly-typed results, but for this example, we'll just return a string array.

Background

One of my projects imports files in a variety of formats. Delimited records are easy, but some vendors still love their flat files.

Using the Code

We are going to write a very simple class with a single method and the unit tests to exercise this method. We're going to assume a level of comfort with C# and unit testing, and we're going to use the Microsoft Team Test testing framework.

First, let's just create a new C# class library project called LINQToFixedWidth. It'll contain a single class called Class1. Rename it to FixedWidth, and add our method with two arguments (the string we are parsing, and the widths of each field):

C#
public string[] SplitByWidth(string s, int[] widths)
{
}

Naturally, this won't Build quite yet. Let's just give it something to return so we can build and create our first unit test.

C#
public string[] SplitByWidth(string s, int[] widths)
{
    return new string[0];
}

Now our method returns an empty string array and our project will compile. Before we write our method, let's make sure we know what we're looking for and create our test. In VS2008 Team System, just right-click the method and click Create Unit Tests... Create a new C# test project called LINQToFixedWidth_Test. It will create your new project and test class, and a template method. Let's change the method so we can actually put into code the behavior we want.

We'll start out with a string that's easy to parse visually, so we can tell quickly in the debugger whether something is wrong. Five fields, each two characters:

C#
string s = "1122334455";
int[] widths = { 2, 2, 2, 2, 2 };

The array we expect back is:

C#
string[] expected = { "11", "22", "33", "44", "55" };

We'll write up a little loop so we can easily tell which element is incorrect rather than compare the entire array at once. The completed test method looks like this:

C#
/// <summary> 
///A test for SplitByWidth
///</summary>
[TestMethod()]
public void SplitByWidthTest()
{
    FixedWidth target = new FixedWidth();
    string s = "1122334455";
    int[] widths = { 2, 2, 2, 2, 2 };
    string[] expected = { "11", "22", 
                          "33", "44", "55" };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(expected.Length, actual.Length);
    for (int i = 0; i < expected.Length; i++)
    {
        Assert.AreEqual(expected[i], actual[i], expected[i]);
    }
}

Now we have a reasonable test case: to properly split this 10-character string into 5, 2-character fields. We run this, and naturally, it fails. Our expected array's length is 5, and the actual array comes back empty. Now that we know what we want, and that we won't get it by accident, let's write some code!

We don't need the empty array any longer -- we only needed it to build the class. We'll start with the array of fields we're going to return. Let's set it to the same size as the number of fields specified in the widths array argument.

C#
public string[] SplitByWidth(string s, int[] widths)
{
    string[] ret = new string[widths.Length];
    return ret;
}

If we run our test again, now we see that it fails at a different line! The array lengths are the same. The elements don't match, but we have progressed. Now, let's get to the business of populating that array.

C#
public string[] SplitByWidth(string s, int[] widths)
{
    string[] ret = new string[widths.Length];
    char[] c = s.ToCharArray();
    int startPos = 0;
    for (int i = 0; i < widths.Length; i++)
    {
        int width = widths[i];
        ret[i] = new string(c.Skip(startPos).Take(width).ToArray<char>());
        startPos += width;
    }
    return ret;
}

We'll turn the string into a character array. This is where the magic happens. LINQ allows us fabulous, fascinating, fantastical functions with arrays. We're focusing on two LINQ methods: Skip() and Take(). Skip() does exactly what it says: it skips elements in an array. Our first loop through says we should skip 0 places, and then take our field width worth of elements from the array. The first field starts at 0 and is 2 characters long.

Once we have our characters, we'll specify that they are to be returned as an array (the ToArray<char>() call gives us a character array) and create a new string containing that result. The first field is populated. All we have left to do is to start our new start position to where the field ended, and loop.

Now we run the test and it passes! Hooray! Let's write a test for a more complicated row and make sure our logic works.

C#
/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod()]
public void SplitByWidthTest2()
{
    FixedWidth target = new FixedWidth();
    string s = "111222222222344444444444445555";
    int[] widths = { 3, 9, 1, 13, 4 };
    string[] expected = { "111", "222222222", 
      "3", "4444444444444", "5555" };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(expected.Length, actual.Length);
    for (int i = 0; i < expected.Length; i++)
    {
        Assert.AreEqual(expected[i], actual[i], expected[i]);
    }
}

Run it and we see it passes, too! Good job.

Our test cases work. Now, let's make our method a little more solid and add some negative test cases. What happens when we pass in a null value? Or an empty string? Or a string that isn't long enough to support all of the field widths specified? Let's add some tests and find out.

C#
[TestMethod(), ExpectedException(typeof(ArgumentException), "No field sizes specified.")]
public void SplitByWidthNoFieldsTest()
{
    FixedWidth target = new FixedWidth();
    string s = null;
    int[] widths = { };
    string[] expected = { };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
}

Our test method expects an ArgumentException. Run this. You'll see it fails. Now, let's add the exception handler.

C#
if (widths.Length==0)
    throw new ArgumentException("No field sizes specified.");

Our test passes! Let's add another check.

C#
[TestMethod(), ExpectedException(typeof(ArgumentException), 
  "String does not contain enough characters for this format.")]
public void SplitByWidthsTooLongTest()
{
    FixedWidth target = new FixedWidth();
    string s = "1";
    int[] widths = { 5 };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
}

And we fail, just like we expected! Let's add the code to make the test pass.

C#
if (s.Length < widths.Sum())
    throw new ArgumentException("String does not contain enough " + 
                                "characters for this format.");

We get to use another LINQ function! Without any looping, we can now get the sum of all of the elements in an array using the Sum() method! If our string isn't long enough, we throw the exception. Now the test passes. Next condition, a null or empty string. Now, if we pass in an empty string and an empty width array, we throw another argument exception.

C#
[TestMethod(), ExpectedException(typeof(ArgumentException), "No data provided.")]
public void SplitByWidthsTooLongNullTest()
{
    FixedWidth target = new FixedWidth();
    string s = null;
    int[] widths = { 5 };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(0, actual.Length);
}

And we code that, too:

C#
if (string.IsNullOrEmpty(s))
    throw new ArgumentException("No data provided.");

To wrap up what we've done, our test cases are:

C#
/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod()]
public void SplitByWidthTest()
{
    FixedWidth target = new FixedWidth();
    string s = "1122334455";
    int[] widths = { 2, 2, 2, 2, 2 };
    string[] expected = { "11", "22", 
      "33", "44", "55" };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(expected.Length, actual.Length);
    for (int i = 0; i < expected.Length; i++)
    {
        Assert.AreEqual(expected[i], actual[i], expected[i]);
    }
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod()]
public void SplitByWidthTest2()
{
    FixedWidth target = new FixedWidth();
    string s = "111222222222344444444444445555";
    int[] widths = { 3, 9, 1, 13, 4 };
    string[] expected = { "111", "222222222", 
      "3", "4444444444444", "5555" };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(expected.Length, actual.Length);
    for (int i = 0; i < expected.Length; i++)
    {
        Assert.AreEqual(expected[i], actual[i], expected[i]);
    }
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod()]
public void SplitByWidthTwoLinesTest()
{
    FixedWidth target = new FixedWidth();
    string s = "1122334455\r\n5544332211";
    int[] widths = { 2, 2, 2, 2, 2 };
    string[] expected1 = { "11", "22", "33", "44", "55" };
    string[] expected2 = { "55", "44", "33", "22", "11" };
    string[] lines = Regex.Split(s, "\r\n");
    string[] actual1 = target.SplitByWidth(lines[0], widths);
    string[] actual2 = target.SplitByWidth(lines[1], widths);
    Assert.AreEqual(expected1.Length, actual1.Length);
    for (int i = 0; i < expected1.Length; i++)
    {
        Assert.AreEqual(expected1[i], actual1[i], expected1[i]);
    }
    Assert.AreEqual(expected2.Length, actual2.Length);
    for (int i = 0; i < expected2.Length; i++)
    {
        Assert.AreEqual(expected2[i], actual2[i], expected2[i]);
    }
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod(), ExpectedException(typeof(ArgumentException), 
    "No field sizes specified.")]
public void SplitByWidthNoFieldsTest()
{
    FixedWidth target = new FixedWidth();
    string s = null;
    int[] widths = { };
    string[] expected = { };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod(), ExpectedException(typeof(ArgumentException), 
  "String does not contain enough characters for this format.")]
public void SplitByWidthsTooLongTest()
{
    FixedWidth target = new FixedWidth();
    string s = "1";
    int[] widths = { 5 };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod(), ExpectedException(typeof(ArgumentException), 
  "No field sizes specified.")]
public void SplitByWidthsNullTest()
{
    FixedWidth target = new FixedWidth();
    string s = null;
    int[] widths = { };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(0, actual.Length);
}

/// <summary>
///A test for SplitByWidth
///</summary>
[TestMethod(), ExpectedException(typeof(ArgumentException), "No data provided.")]
public void SplitByWidthsTooLongNullTest()
{
    FixedWidth target = new FixedWidth();
    string s = null;
    int[] widths = { 5 };
    string[] actual;
    actual = target.SplitByWidth(s, widths);
    Assert.AreEqual(0, actual.Length);
}

And our method:

C#
public string[] SplitByWidth(string s, int[] widths)
{
    if (widths.Length == 0)
        throw new ArgumentException("No field sizes specified.");

    if (string.IsNullOrEmpty(s))
        throw new ArgumentException("No data provided.");

    if (s.Length < widths.Sum())
        throw new ArgumentException("String does not contain " + 
              "enough characters for this format.");

    string[] ret = new string[widths.Length];
    char[] c = s.ToCharArray();
    int startPos = 0;
    for (int i = 0; i < widths.Length; i++)
    {
        int width = widths[i];
        ret[i] = new string(c.Skip(startPos).Take(width).ToArray<char>());
        startPos += width;
    }
    return ret;
}

We've used LINQ to parse a fixed-width flat file row and we have full code coverage. I would say our method is pretty solid. Congratulations!

Points of Interest

Naturally, we've been able to process fixed-width records for generations. There are hieroglyphics in the Pyramids that detail the Ancients' solutions to parsing fixed-width files. This little recipe shows us how to use a couple of simple LINQ functions to easily and quickly achieve our desired results.

History

  • Submitted on October 15, 2009.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)