Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Mobile

Simple and Fast CSV Library in C#

4.80/5 (60 votes)
21 Mar 2014CPOL2 min read 205.7K   7.3K  
This library allow serialization and deserialization of objects to CSV with very little code.

Introduction

This library allows to save a collection of objects to a CSV file.
It also allows loading from CSV files with linq expressions.

It is:

  • Simple to add to your application (single C# file)
  • Simple to use (loading and saving can be done with a single line of code)
  • Fast (the demo creates a million client records in 2 seconds on my laptop)
  • Small memory footprint (tested with over 10,0000,000 records).

Image 1

Using the Code: Saving to CSV

Using the code is very easy.

If you have a list of clients in an array, saving it to CSV takes a single line:

C#
clients.ToCsv("client.csv");     

The CSV generated will list the public properties of the objects in the collection (in this example, the client class):

ClientId;FirstName;MiddleName;LastName;Occupation;City
100;Michael;Son;Ray;Actuary;Fort Worth
101;Emmanuel;Nick;Williams;Earth Moving Equipment Mechanic;Ontario
102;Phyllis;Melanie;Fox;Knitter;Fresno
103;Ricardo;Percy;Beasley;Labourer;Columbus
104;Kerri;Jewell;Bennett;Production Manager;Jacksonville
105;Carlos;Luis;Hart;Locksmith;Buffalo
106;Heath;Herbert;Glass;Aeronautical Engineer;Seattle
107;Randall;Hans;Tanner;Air Traffic Controller;Corona
108;Damien;;Washington;Chartered Secretary;Allentown
109;Charlene;Alice;Thompson;Committee Clerk;Salem
110;Ann;Alicia;Potts;Actor;Raleigh
111;Richard;Gerardo;Riddle;Assayer Sampler;Akron
112;Pierre;Adolfo;Mendez;Oceanographer;Birmingham      

You can also save to CSV using this alternative syntax, if you don't have all the objects at once in memory:

C#
using (var csvFile = new CsvFile<Client>("clients.csv"))
{
    for (int i = 0 ; i < 1000000 ; i++)
    {
        var client = new Client() { id = i, name = "Client " + i };
        csvFile.Append(user);
    }
}    

Using the Code: Loading from CSV

To load a CSV file, you use the method CsvFile.Read.

This command takes a class type (the client class in this example) and a filename.

C#
foreach (var c in from CsvFile.Read<Client>("clients.csv"))
{
    // client can be used here
}  

The method CsvFile.Read reads the records one by one directly from disk without keeping them in memory.
This allows you to load gigantic files with no memory trouble.

Note that you can use Linq expression when loading CSV files.

C#
foreach (var c in from client in CsvFile.Read<Client>("clients.csv")
 where client.FirstName.StartsWith("J")
 && client.MiddleName.StartsWith("F")
 && client.LastName.StartsWith("K")
 select client)
{
    // client can be used here 
} 

Going Further

The library is able to work with streams and stream reader/writers rather than file names.
This can be useful if you want to parse data not stored in files or coming from the web for example.

I use the ';' character as field separator to be compatible by default with Excel. You can change separators or the list of columns, but the extensibility is quite limited.

In version 1.1, I plan to add the ability to define calculated columns, this would allow you to do all the formatting you need without having to add too much code on the library side.

If you feel that the library is not doing exactly what you want, I don't blame you, it is designed for what I need.
I encourage you to change it and if you want, give me some feedback in a message below or through the code project email system.
There is no support for appending to CSV file yet.

Full Source

The entire source of the library is here:

C++
using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using System.Text;

#if !NET_3_5
using System.Threading;
using System.Collections.Concurrent;
using System.Threading.Tasks;
#endif

namespace CsvFiles
{
    public class CsvFile : IDisposable
    {
        #region Static Members
        public static CsvDefinition DefaultCsvDefinition { get; set; }
        public static bool UseLambdas { get; set; }
        public static bool UseTasks { get; set; }
        public static bool FastIndexOfAny { get; set; }

        static CsvFile()
        {
            // Choosing default Field Separator is a hard decision
            // In theory the C of CSV means Comma separated 
            // In Windows though many people will try to open the csv with Excel 
            // which is horrid with real CSV.
            // As the target for this library is Windows we go for Semi Colon.
            DefaultCsvDefinition = new CsvDefinition
                {
                    EndOfLine = "\r\n",
                    FieldSeparator = ';',
                    TextQualifier = '"'
                };
            UseLambdas = true;
            UseTasks = true;
            FastIndexOfAny = true;
        }

        #endregion

        internal protected Stream BaseStream;
        protected static DateTime DateTimeZero = new DateTime();

        public static IEnumerable<T> Read<T>(CsvSource csvSource) where T : new()
        {
            var csvFileReader = new CsvFileReader<T>(csvSource);
            return (IEnumerable<T>)csvFileReader;
        }

        public char FieldSeparator { get; private set; }
        public char TextQualifier { get; private set; }
        public IEnumerable<String> Columns { get; private set; }

        public void Dispose()
        {
            Dispose(true);
        }

        protected virtual void Dispose(bool disposing)
        {
            // overridden in derived classes
        }
    }

    public class CsvFile<T> : CsvFile
    {
        private readonly char fieldSeparator;
        private readonly string fieldSeparatorAsString;
        private readonly char[] invalidCharsInFields;
        private readonly StreamWriter streamWriter;
        private readonly char textQualifier;
        private readonly String[] columns;
        private Func<T, object>[] getters;
        readonly bool[] isInvalidCharInFields;
#if !NET_3_5
        private int linesToWrite;
        private readonly BlockingCollection<string> csvLinesToWrite = 
                                         new BlockingCollection<string>(5000);
        private readonly Thread writeCsvLinesTask;
        private Task addAsyncTask;
#endif

        public CsvFile(CsvDestination csvDestination)
            : this(csvDestination, null)
        {
        }

        public CsvFile()
        {
        }

        public CsvFile(CsvDestination csvDestination, CsvDefinition csvDefinition)
        {
            if (csvDefinition == null)
                csvDefinition = DefaultCsvDefinition;
            this.columns = (csvDefinition.Columns ?? InferColumns(typeof(T))).ToArray();
            this.fieldSeparator = csvDefinition.FieldSeparator;
            this.fieldSeparatorAsString = this.fieldSeparator.ToString(CultureInfo.InvariantCulture);
            this.textQualifier = csvDefinition.TextQualifier;
            this.streamWriter = csvDestination.StreamWriter;

            this.invalidCharsInFields = new[] { '\r', '\n', this.textQualifier, this.fieldSeparator };
            this.isInvalidCharInFields = new bool[256];

            foreach (var c in this.invalidCharsInFields)
            {
                this.isInvalidCharInFields[c] = true;
            }
            this.WriteHeader();

            this.CreateGetters();
#if !NET_3_5
            if (CsvFile.UseTasks)
            {
                writeCsvLinesTask = new Thread((o) => this.WriteCsvLines());
                writeCsvLinesTask.Start();
            }
            this.addAsyncTask = Task.Factory.StartNew(() => { });
#endif
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                // free managed resources
#if !NET_3_5
                addAsyncTask.Wait();
                if (csvLinesToWrite != null)
                {
                    csvLinesToWrite.CompleteAdding();
                }
                if (writeCsvLinesTask != null)
                    writeCsvLinesTask.Join();                
#endif
                this.streamWriter.Close();
            }
        }

        protected static IEnumerable<string> InferColumns(Type recordType)
        {
            var columns = recordType
                .GetProperties(BindingFlags.Public | BindingFlags.Instance)
                .Where(pi => pi.GetIndexParameters().Length == 0
                    && pi.GetSetMethod() != null
                    && !Attribute.IsDefined(pi, typeof(CsvIgnorePropertyAttribute)))
                .Select(pi => pi.Name)
                .Concat(recordType
                    .GetFields(BindingFlags.Public | BindingFlags.Instance)
                    .Where(fi => !Attribute.IsDefined(fi, typeof(CsvIgnorePropertyAttribute)))
                    .Select(fi => fi.Name))
                .ToList();
            return columns;
        }

#if !NET_3_5
        private void WriteCsvLines()
        {
            int written = 0;
            foreach (var csvLine in csvLinesToWrite.GetConsumingEnumerable())
            {
                this.streamWriter.WriteLine(csvLine);
                written++;
            }
            Interlocked.Add(ref this.linesToWrite, -written);
        }
#endif

        public void Append(T record)
        {

            if (CsvFile.UseTasks)
            {
#if !NET_3_5
                var linesWaiting = Interlocked.Increment(ref this.linesToWrite);
                Action<Task> addRecord = (t) =>
                    {
                        var csvLine = this.ToCsv(record);
                        this.csvLinesToWrite.Add(csvLine);
                    };

                if (linesWaiting < 10000)
                    this.addAsyncTask = this.addAsyncTask.ContinueWith(addRecord);
                else
                    addRecord(null);
#else
                throw new NotImplementedException("Tasks");
#endif
            }
            else
            {
                var csvLine = this.ToCsv(record);
                this.streamWriter.WriteLine(csvLine);
            }
        }

        private static Func<T, object> FindGetter(string c, bool staticMember)
        {
            var flags = BindingFlags.Public | BindingFlags.NonPublic | 
            BindingFlags.IgnoreCase | (staticMember ? BindingFlags.Static : BindingFlags.Instance);
            Func<T, object> func = null;
            PropertyInfo pi = typeof(T).GetProperty(c, flags);
            FieldInfo fi = typeof(T).GetField(c, flags);

            if (CsvFile.UseLambdas)
            {
                Expression expr = null;
                ParameterExpression parameter = Expression.Parameter(typeof(T), "r");
                Type type = null;

                if (pi != null)
                {
                    type = pi.PropertyType;
                    expr = Expression.Property(parameter, pi.Name);
                }
                else if (fi != null)
                {
                    type = fi.FieldType;
                    expr = Expression.Field(parameter, fi.Name);
                }
                if (expr != null)
                {
                    Expression<Func<T, object>> lambda;
                    if (type.IsValueType)
                    {
                        lambda = Expression.Lambda<Func<T, object>>
                                 (Expression.TypeAs(expr, typeof(object)), parameter);
                    }
                    else
                    {
                        lambda = Expression.Lambda<Func<T, object>>(expr, parameter);
                    }
                    func = lambda.Compile();
                }
            }
            else
            {
                if (pi != null)
                    func = o => pi.GetValue(o, null);
                else if (fi != null)
                    func = o => fi.GetValue(o);
            }
            return func;
        }

        private void CreateGetters()
        {
            var list = new List<Func<T, object>>();

            foreach (var columnName in columns)
            {
                Func<T, Object> func = null;
                var propertyName = (columnName.IndexOf(' ') < 0 ? 
                                    columnName : columnName.Replace(" ", ""));
                func = FindGetter(columnName, false) ?? FindGetter(columnName, true);

                list.Add(func);
            }
            this.getters = list.ToArray();
        }

        private string ToCsv(T record)
        {
            if (record == null)
                throw new ArgumentException("Cannot be null", "record");

            string[] csvStrings = new string[getters.Length];

            for (int i = 0 ; i < getters.Length ; i++)
            {
                var getter = getters[i];
                object fieldValue = getter == null ? null : getter(record);
                csvStrings[i] = this.ToCsvString(fieldValue);
            }
            return string.Join(this.fieldSeparatorAsString, csvStrings);

        }

        private string ToCsvString(object o)
        {
            if (o != null)
            {
                string valueString = o as string ?? Convert.ToString(o, CultureInfo.CurrentUICulture);
                if (RequiresQuotes(valueString))
                {
                    var csvLine = new StringBuilder();
                    csvLine.Append(this.textQualifier);
                    foreach (char c in valueString)
                    {
                        if (c == this.textQualifier)
                            csvLine.Append(c); // double the double quotes
                        csvLine.Append(c);
                    }
                    csvLine.Append(this.textQualifier);
                    return csvLine.ToString();
                }
                else
                    return valueString;
            }
            return string.Empty;
        }

        private bool RequiresQuotes(string valueString)
        {
            if (CsvFile.FastIndexOfAny)
            {
                var len = valueString.Length;
                for (int i = 0 ; i < len ; i++)
                {
                    char c = valueString[i];
                    if (c <= 255 && this.isInvalidCharInFields[c])
                        return true;
                }
                return false;
            }
            else
            {
                return valueString.IndexOfAny(this.invalidCharsInFields) >= 0;
            }
        }

        private void WriteHeader()
        {
            var csvLine = new StringBuilder();
            for (int i = 0 ; i < this.columns.Length ; i++)
            {
                if (i > 0)
                    csvLine.Append(this.fieldSeparator);
                csvLine.Append(this.ToCsvString(this.columns[i]));
            }
            this.streamWriter.WriteLine(csvLine.ToString());
        }
    }

    internal class CsvFileReader<T> : CsvFile, IEnumerable<T>, IEnumerator<T>
     where T : new()
    {
        private readonly Dictionary<Type, List<Action<T, String>>> allSetters = 
                                     new Dictionary<Type, List<Action<T, String>>>();
        private string[] columns;
        private char curChar;
        private int len;
        private string line;
        private int pos;
        private T record;
        private readonly char fieldSeparator;
        private readonly TextReader textReader;
        private readonly char textQualifier;
        private readonly StringBuilder parseFieldResult = new StringBuilder();

        public CsvFileReader(CsvSource csvSource)
            : this(csvSource, null)
        {
        }

        public CsvFileReader(CsvSource csvSource, CsvDefinition csvDefinition)
        {
            var streamReader = csvSource.TextReader as StreamReader;
            if (streamReader != null)
                this.BaseStream = streamReader.BaseStream;
            if (csvDefinition == null)
                csvDefinition = DefaultCsvDefinition;
            this.fieldSeparator = csvDefinition.FieldSeparator;
            this.textQualifier = csvDefinition.TextQualifier;

            this.textReader = 
               csvSource.TextReader;// new FileStream(csvSource.TextReader, FileMode.Open);

            this.ReadHeader(csvDefinition.Header);
        }

        public T Current
        {
            get { return this.record; }
        }

        public bool Eof
        {
            get { return this.line == null; }
        }

        object IEnumerator.Current
        {
            get { return this.Current; }
        }
        
        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                // free managed resources
                this.textReader.Dispose();
            }
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        public IEnumerator<T> GetEnumerator()
        {
            return this;
        }

        public bool MoveNext()
        {
            this.ReadNextLine();
            if (this.line == null && (this.line = this.textReader.ReadLine()) == null)
            {
                this.record = default(T);
            }
            else
            {
                this.record = new T();
                Type recordType = typeof(T);
                List<Action<T, String>> setters;
                if (!this.allSetters.TryGetValue(recordType, out setters))
                {
                    setters = this.CreateSetters();
                    this.allSetters[recordType] = setters;
                }

                var fieldValues = new string[setters.Count];
                for (int i = 0 ; i < setters.Count ; i++)
                {
                    fieldValues[i] = this.ParseField();
                    if (this.curChar == this.fieldSeparator)
                        this.NextChar();
                    else
                        break;
                }
                for (int i = 0 ; i < setters.Count ; i++)
                {
                    var setter = setters[i];
                    if (setter != null)
                    {
                        setter(this.record, fieldValues[i]);
                    }
                }
            }
            return (this.record != null);
        }

        public void Reset()
        {
            throw new NotImplementedException("Cannot reset CsvFileReader enumeration.");
        }

        private static Action<T, string> EmitSetValueAction(MemberInfo mi, Func<string, object> func)
        {
            ParameterExpression paramExpObj = Expression.Parameter(typeof(object), "obj");
            ParameterExpression paramExpT = Expression.Parameter(typeof(T), "instance");

            {
                var pi = mi as PropertyInfo;
                if (pi != null)
                {
#if !NET_3_5
            if (CsvFile.UseLambdas)
            {
                var callExpr = Expression.Call(
                    paramExpT, 
                    pi.GetSetMethod(),
                    Expression.ConvertChecked(paramExpObj, pi.PropertyType));
                var setter = Expression.Lambda<Action<T, object>>(
                    callExpr,
                    paramExpT,
                    paramExpObj).Compile();
                return (o, s) => setter(o, func(s));
            }
#endif
                    return (o, v) => pi.SetValue(o, (object)func(v), null);

                }
            }
            {
                var fi = mi as FieldInfo;
                if (fi != null)
                {
#if !NET_3_5
            if (CsvFile.UseLambdas)
            {
                //ParameterExpression valueExp = Expression.Parameter(typeof(string), "value");
                var valueExp = Expression.ConvertChecked(paramExpObj, fi.FieldType);
 
                // Expression.Property can be used here as well
                MemberExpression fieldExp = Expression.Field(paramExpT, fi);
                BinaryExpression assignExp = Expression.Assign(fieldExp, valueExp);
 
                var setter = Expression.Lambda<Action<T, object>>
                    (assignExp, paramExpT, paramExpObj).Compile();
 
                return (o, s) => setter(o, func(s));
            }
#endif
                    return ((o, v) => fi.SetValue(o, func(v)));
                }
            }
            throw new NotImplementedException();
        }

        private static Action<T, string> FindSetter(string c, bool staticMember)
        {
            var flags = BindingFlags.Public | BindingFlags.NonPublic | 
            BindingFlags.IgnoreCase | (staticMember ? BindingFlags.Static : BindingFlags.Instance);
            Action<T, string> action = null;
            PropertyInfo pi = typeof(T).GetProperty(c, flags);
            if (pi != null)
            {
                var pFunc = StringToObject(pi.PropertyType);
                action = EmitSetValueAction(pi, pFunc);
            }
            FieldInfo fi = typeof(T).GetField(c, flags);
            if (fi != null)
            {
                var fFunc = StringToObject(fi.FieldType);
                action = EmitSetValueAction(fi, fFunc);
            }
            return action;
        }

        private static Func<string, object> StringToObject(Type propertyType)
        {
            if (propertyType == typeof(string))
                return (s) => s ?? String.Empty;
            else if (propertyType == typeof(Int32))
                return (s) => String.IsNullOrEmpty(s) ? 0 : Int32.Parse(s);
            if (propertyType == typeof(DateTime))
                return (s) => String.IsNullOrEmpty(s) ? DateTimeZero : DateTime.Parse(s);
            else
                throw new NotImplementedException();
        }

        private List<Action<T, string>> CreateSetters()
        {
            var list = new List<Action<T, string>>();
            for (int i = 0 ; i < this.columns.Length ; i++)
            {
                string columnName = this.columns[i];
                Action<T, string> action = null;
                if (columnName.IndexOf(' ') >= 0)
                    columnName = columnName.Replace(" ", "");
                action = FindSetter(columnName, false) ?? FindSetter(columnName, true);

                list.Add(action);
            }
            return list;
        }

        private void NextChar()
        {
            if (this.pos < this.len)
            {
                this.pos++;
                this.curChar = this.pos < this.len ? this.line[this.pos] : '\0';
            }
        }

        private void ParseEndOfLine()
        {
            throw new NotImplementedException();
        }


        private string ParseField()
        {
            parseFieldResult.Length = 0;
            if (this.line == null || this.pos >= this.len)
                return null;
            while (this.curChar == ' ' || this.curChar == '\t')
            {
                this.NextChar();
            }
            if (this.curChar == this.textQualifier)
            {
                this.NextChar();
                while (this.curChar != 0)
                {
                    if (this.curChar == this.textQualifier)
                    {
                        this.NextChar();
                        if (this.curChar == this.textQualifier)
                        {
                            this.NextChar();
                            parseFieldResult.Append(this.textQualifier);
                        }
                        else
                            return parseFieldResult.ToString();
                    }
                    else if (this.curChar == '\0')
                    {
                        if (this.line == null)
                            return parseFieldResult.ToString();
                        this.ReadNextLine();
                    }
                    else
                    {
                        parseFieldResult.Append(this.curChar);
                        this.NextChar();
                    }
                }
            }
            else
            {
                while (this.curChar != 0 && this.curChar != this.fieldSeparator && 
                                            this.curChar != '\r' && this.curChar != '\n')
                {
                    parseFieldResult.Append(this.curChar);
                    this.NextChar();
                }
            }
            return parseFieldResult.ToString();
        }

        private void ReadHeader(string header)
        {
            if (header == null)
            {
                this.ReadNextLine();
            }
            else
            {
                // we read the first line from the given header
                this.line = header;
                this.pos = -1;
                this.len = this.line.Length;
                this.NextChar();
            }

            var readColumns = new List<string>();
            string columnName;
            while ((columnName = this.ParseField()) != null)
            {
                readColumns.Add(columnName);
                if (this.curChar == this.fieldSeparator)
                    this.NextChar();
                else
                    break;
            }
            this.columns = readColumns.ToArray();
        }

        private void ReadNextLine()
        {
            this.line = this.textReader.ReadLine();
            this.pos = -1;
            if (this.line == null)
            {
                this.len = 0;
                this.curChar = '\0';
            }
            else
            {
                this.len = this.line.Length;
                this.NextChar();
            }
        }
    }

    public class CsvDefinition
    {
        public string Header { get; set; }
        public char FieldSeparator { get; set; }
        public char TextQualifier { get; set; }
        public IEnumerable<String> Columns { get; set; }
        public string EndOfLine { get; set; }

        public CsvDefinition()
        {
            if (CsvFile.DefaultCsvDefinition != null)
            {
                FieldSeparator = CsvFile.DefaultCsvDefinition.FieldSeparator;
                TextQualifier = CsvFile.DefaultCsvDefinition.TextQualifier;
                EndOfLine = CsvFile.DefaultCsvDefinition.EndOfLine;
            }
        }
    }

    public class CsvSource
    {
        public readonly TextReader TextReader;

        public static implicit operator CsvSource(CsvFile csvFile)
        {
            return new CsvSource(csvFile);
        }

        public static implicit operator CsvSource(string path)
        {
            return new CsvSource(path);
        }

        public static implicit operator CsvSource(TextReader textReader)
        {
            return new CsvSource(textReader);
        }

        public CsvSource(TextReader textReader)
        {
            this.TextReader = textReader;
        }

        public CsvSource(Stream stream)
        {
            this.TextReader = new StreamReader(stream);
        }

        public CsvSource(string path)
        {
            this.TextReader = new StreamReader(path);
        }

        public CsvSource(CsvFile csvFile)
        {
            this.TextReader = new StreamReader(csvFile.BaseStream);
        }
    }

    public class CsvDestination
    {
        public StreamWriter StreamWriter;

        public static implicit operator CsvDestination(string path)
        {
            return new CsvDestination(path);
        }
        private CsvDestination(StreamWriter streamWriter)
        {
            this.StreamWriter = streamWriter;
        }

        private CsvDestination(Stream stream)
        {
            this.StreamWriter = new StreamWriter(stream);
        }

        public CsvDestination(string fullName)
        {
            FixCsvFileName(ref fullName);
            this.StreamWriter = new StreamWriter(fullName);
        }

        private static void FixCsvFileName(ref string fullName)
        {
            fullName = Path.GetFullPath(fullName);
            var path = Path.GetDirectoryName(fullName);
            if (path != null && !Directory.Exists(path))
                Directory.CreateDirectory(path);
            if (!String.Equals(Path.GetExtension(fullName), ".csv"))
                fullName += ".csv";
        }
    }

    public static class CsvFileLinqExtensions
    {
        public static void ToCsv<T>(this IEnumerable<T> source, CsvDestination csvDestination)
        {
            source.ToCsv(csvDestination, null);
        }

        public static void ToCsv<T>(this IEnumerable<T> source, 
                 CsvDestination csvDestination, CsvDefinition csvDefinition)
        {
            using (var csvFile = new CsvFile<T>(csvDestination, csvDefinition))
            {
                foreach (var record in source)
                {
                    csvFile.Append(record);
                }
            }
        }
    }

    public class CsvIgnorePropertyAttribute : Attribute
    {
        public override string ToString()
        {
            return "Ignore Property";
        }
    }
    // 2013-11-29 Version 1
    // 2014-01-06 Version 2: add CoryLuLu suggestions
} 

Points of Interest

When available, I use the System.Task library and one of the fancy collections that comes with it. On a multi-core machine, the gain in speed is spectacular.

History

  • November 20, 2013
    • First version
  • March 21, 2014
    • Allows files with no headers
    • Added support for any textReader
    • Added a sample with an in memory file and custom separator

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)