Introduction
CSV files are still found all over, and developer's often are faced with situations for parsing and manipulating that data. Often times, we want to take the CSV data and use it to initialize objects. In this article, we'll take a look at one approach to mapping incoming CSV data to our own objects. For brevity, I will assume that you have already developed a way to parse a given CSV input line and convert it to an array of strings.
Background
I was first prompted to look at this problem when I was asked by a customer if there was an easy way to map incoming CSV data to objects. He had already figured out how to use regular expressions to parse the line of text he read into his application to create an array containing all the fields from the data file. It really was a matter of then creating objects from that array.
The obvious and brute force method would be something like this:
Customer customerObj = new Customer();
customerObj.Name = datafields[0];
customerObj.DateOfBirth = DateTime.Parse(datafields[1]);
customerObj.Age = int.Parse(datafields[2]);
That would be fairly straightforward, but with more than a few objects or properties, it would get pretty tedious. And, there is no accounting for any custom processing of the input data prior to assigning it to fields. You could also come up with a special constructor for each class that would take an array of data and set the object up correctly, which would probably be a marginally better approach.
The Approach
My initial two thoughts when faced with this problem were:
- it made a lot of sense to have some kind of loader that could process incoming array data and instantiate an arbitrary class with data, and
- some way to make the mapping of the array data to class data easy.
With those two thoughts in mind (and thereby limiting my other remaining thoughts to one, since I can only manage three things at a time), I set out to free up my thought queue as fast as possible.
From those two thoughts, I picked three key things that drove my thinking:
- I would need a
Loader
class of some kind. I figured for my first shot at this, I would like a couple of static methods that could be either given an existing object to populate with a given array of string data, or it could be told the type of object to create based on the array data, and it would return a new object of the correct type fully populated.
- Because the
Loader
needed to work with any kind of class, I would need to use .NET Reflection to interrogate the class for what information needed to be updated.
- Since I needed a mapping feature and I would need Reflection to some degree, the idea of using a custom .NET attribute to �mark up� the properties on a class so the
Loader
would know how to map the array data to the property.
Creating the Custom Attribute
I started tackling this idea by working backwards on my list. First, I needed a .NET attribute I could use. If you have never worked with custom attributes, they are pretty cool, though they almost always lead to using Reflection. I think many developers get scared off by Reflection for whatever reason (since you don�t see it used in a lot of scenarios where it would make life a ton easier), and that is a shame. Reflection, really, is straightforward, so make sure it is part of your toolbox. To create a custom attribute, you just need to define a class that inherits from System.Attribute
, add some public fields to it, at least one constructor, and you are rocking and rolling. Here is the attribute I declared for my project:
[AttributeUsage(AttributeTargets.Property)]
public class CSVPositionAttribute : System.Attribute
{
public int Position;
public CSVPositionAttribute(int position)
{
Position = position;
}
public CSVPositionAttribute()
{
}
}
In this case, the user will need to supply a Position
value as part of the attribute. The other thing to notice about this attribute is the use of the [AttributeUsage(AttributeTargets.Property)]
attribute above the class declaration. This attribute declares that my custom attribute can only be assigned to properties of a class, and cannot be used on the class itself, methods, fields, etc.
To use this custom property, all I would have to do is the following:
public class SomeClass
{
private int _age;
[CSVPosition(2)]
public int Age
{
get { return _age;}
set {_age = value;}
}
}
The [CSVPosittion]
attribute sets the Position
field to two. Note that even though our custom attribute class name is CSVPositionAttribute
, I can shorten that to CSVPosition
(dropping the Attribute suffix) when using the actual attribute to mark up a property. This gives me a simple way to mark up my objects to be loaded with information contained in an array derived from a line in a CSV file.
Creating the Loader with Reflection
The next step, is to have a way to figure out how to take some arbitrary class, figure out which properties are to be populated with data from a CSV, and update the object with that data. To do that, I will use .NET Reflection. I start by creating a new class called Loader
that will have a single method (for now) as follows:
public class ClassLoader
{
public static void Load(object target,
string[] fields, bool supressErrors)
{
}
}
The Load
method is a static method that takes any target object to be loaded from the CSV data, an array of strings (the data parsed from a single line in the CSV file), and a flag on whether or not errors encountered during the processing of the data should be suppressed or not. One quick point to make is that I am using a very simple approach to handle errors for this demo. There is certainly a much richer and robust way to handle errors, but I leave that to you, dear reader, to implement as needed.
The first thing I am going to need to do is evaluate the incoming object for all of its available properties and check those properties for the CSVPosition
attribute. Getting a list of an object�s properties is very easy using Reflection:
Type targetType = target.GetType();
PropertyInfo[] properties = targetType.GetProperties();
I can then iterate over the properties
array, and use the PropertyInfo
objects to determine if a given property needs to be loaded with data from the CSV field array.
foreach (PropertyInfo property in properties)
{
if (property.CanWrite)
{
object[] attributes =
property.GetCustomAttributes(typeof(CSVPositionAttribute),
false);
if (attributes.Length > 0)
{
CSVPositionAttribute positionAttr =
(CSVPositionAttribute)attributes[0];
int position = positionAttr.Position;
try
{
object data = fields[position];
property.SetValue(target,
Convert.ChangeType(data, property.PropertyType), null);
}
catch
{
if (!supressErrors)
throw;
}
}
}
}
You should be able to figure out what is going on, by reading the comments above. Basically, we check each property to see if we can write to it, and if we can, we see if it has a CSVPosition
attribute. If it does, we then retrieve the position value, and pull the appropriate string from the fields array, and then set the value on that property. Its all pretty straightforward. The one thing to be aware of is that someone could theoretically assign more than one CSVPosition
attribute to a given property. The way the code is written, however, only the first CSVPosition
attribute will be used.
Implementing Data Transforms
You may also wonder why the following line of code was used in our Load
routine:
object data = fields[position];
Couldn�t we just as easily pass the fields[position]
data element directly to the SetValue
method? We certainly could. That line, however, leads us to look at the next problem I wanted to solve. That problem is, what happens if the incoming string value needs to be processed or formatted so its default state be used as is. Examples might include getting a value �One� that we want to assign to an integer value, or maybe we want to format a particular string in a certain way before assigning it to the target property. What we would like is to be able to point the Load
routine to a special data transformation routine that is most likely different for each property. How can we do that?
Once again, .NET Reflection will ride to the rescue. Using .NET Reflection, we can call methods on a given object dynamically, even if we don�t know what the names of those methods are at design time. So, the question quickly becomes � how do we let our processing routine know that:
- a particular property needs a special data transform to be used prior to assigning the CSV data, and
- the name of that transformation method?
We will solve both problems by extending our CSVPosition
attribute and modifying our Load
method.
Our new CSVPositionAttribute
class will now look like this:
[AttributeUsage(AttributeTargets.Property)]
public class CSVPositionAttribute : System.Attribute
{
public int Position;
public string DataTransform = string.Empty;
public CSVPositionAttribute(int position,
string dataTransform)
{
Position = position;
DataTransform = dataTransform;
}
public CSVPositionAttribute(int position)
{
Position = position;
}
public CSVPositionAttribute()
{
}
}
As you can see, all we have done is add a new public field named DataTransform
. This field will hold the name of another method on the same class that will be used as a data transformation routine. There may be a way to do this with delegates as well, but I haven�t found a way yet. So, with my brute force method, we can now modify our Load
routine to look like:
try
{
object data = fields[position];
if (positionAttr.DataTransform != string.Empty)
{
MethodInfo method = targetType.GetMethod(positionAttr.DataTransform);
data = method.Invoke(target, new object[] { data });
}
property.SetValue(target, Convert.ChangeType(data,
property.PropertyType), null);
}
The code now checks for a DataTrasform
value and, if present, invokes that method via Reflection and pass the returned data on to the target property. I�ve assumed that any transformation routine that may be used are methods on the same object that is having its properties updated. This would seem to make sense since the object should be responsible for controlling how its data is formatted.
The last thing I did was add an additional method to my Loader
class:
public static X LoadNew<X>(string[] fields, bool supressErrors)
{
X tempObj = (X) Activator.CreateInstance(typeof(X));
Load(tempObj, fields, supressErrors );
return tempObj;
}
Using the Code
Here is a brief example of how to use this code.
I have a Customer
class that I would like to populate from some CSV data. The Customer
class has been marked up as shown below:
class Customer
{
private string _name;
private string _title;
private int _age;
private DateTime _birthDay;
[CSVPosition(2)]
public string Name
{
get { return _name; }
set { _name = value; }
}
[CSVPosition(0,"TitleFormat")]
public string Title
{
get { return _title; }
set { _title = value; }
}
[CSVPosition(1)]
public int Age
{
get { return _age; }
set { _age = value; }
}
[CSVPosition(3)]
public DateTime BirthDay
{
get { return _birthDay; }
set { _birthDay = value; }
}
public Customer()
{
}
public string TitleFormat(string data)
{
return data.Trim().ToUpper();
}
public override string ToString()
{
return "Customer object [" + _name + " - " +
_title + " - " + _age + " - " + _birthDay + "]";
}
}
Populating this class with data using the Loader
class can be done in one of two ways. First, we can instantiate an instance of our class and pass it to the Loader
to be populated. Or, we can use the LoadNew
method on our Loader
class and have it pass back a populated object on its own. Both examples are shown below:
static void Main(string[] args)
{
string[] fields = { " Manager", "38", "John Doe", "4/1/68" };
Customer customer1 = new Customer();
ClassLoader.Load(customer1, fields, true);
Console.WriteLine(customer1.ToString());
Customer customer2 = ClassLoader.LoadNew<Customer>(fields,false);
Console.WriteLine(customer2.ToString());
Console.ReadLine();
}
That is all there is to it. Hope it helps, and happy coding.