Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

C#: Query Excel and .CSV Files Using LinqToExcel

4.98/5 (31 votes)
14 Nov 2013CPOL7 min read 209.2K  
C#: Query Excel and .CSV files using LinqToExcel.

the-grid-house-500

In my quest to get a rather hack-ey work project DONE and off my machine, I discovered an exceptionally useful OSS tool for working with Microsoft Excel data, and at the same time, I also found a handy library, DocX, for working with Microsoft Word files which not only spares one the issues of Interop/COM, but in fact does not require Interop assemblies at all. We'll take a look at DocX in my next post.

Mostly, I need to get data out of Excel, which admittedly, can be done using ADO.NET. However, LINQ to Excel makes this pretty disgustingly easy.

Image by Elif Ayiter - Some Rights Reserved

Use LinqToExcel to Query Spreadsheet Data Using (wait for it) . . . LINQ

LinqToExcel by Paul Yoder is a .NET library which facilitates querying Excel spreadsheets and .CSV files using the familiar LINQ syntax. The source is available at Yoder's Github Repository, and there is also a Nuget package available. The source is available with a broadly permissive license which essentially allows you to do whatever you want with the code, so long as a copy of the license is included.

It is important to note that you need to use the 64 bit version of LinqToExcel if you are targeting the x64 processor architecture with your application. Also in this case, you will need the 64 bit version of the Access Database Engine installed. This means you are taking a dependency which may not be present or installable on a remote server.  

Update 1018/2013 – As it turns out, as cool as LinqToExcel is for most cases, for MY specific project there was an issue. That dependency on the ACE Database engine. My project was to be deployed to Windows Azure Websites, which does not have support for the ACE Engine. Back to the drawing board. I found another library, ExcelDataReader, which is not as slick as LinqToExcel for simply reading data, but which carries NO DEPENDENCIES on proprietary MS technology which would create issues deploying to a remote server. 

Get the LinqToExcel Package Using Nuget

To get started, either go to the Solution Explorer, you can simply enter the following in the Nuget Package Manager Console:

NOTE: When adding Nuget packages to your project, consider keeping source control bloat down by using Nuget Package Restore so that packages are downloaded automatically on build rather than cluttering up your repo.  

Get LinqToExcel Using the Nuget Package Manager Console
C#
PM> Install-Package LinqToExcel 

Or, you can right-click on "Manage Nuget Packages for Solution" and search for "LinqToExcel" in the online packages in the Package Manager window:

Get LinqToExcel Using Solutions Explorer GUI Menu Item

install-linq-to-excel-nuget-pkg

The Simplest Way to Query Data from Excel

Wanna get some data from an Excel spreadsheet? It's this easy:

Consider the following spreadsheet:

Example Spreadsheet Data

ArtistAlbumsSpreadsheet-Excel

Here is the very simplest way we can query this data and (for the sake of this example) write it out to the console:

Straight and Simple Query Using Linq Syntax
C#
void PrintArtistAlbums()
{
    string pathToExcelFile = ""
        + @"D:\Code\Blog Projects\BlogSandbox\ArtistAlbums.xlsx";
 
    string sheetName = "Sheet1";

    var excelFile = new ExcelQueryFactory(pathToExcelFile);
    var artistAlbums = from a in excelFile.Worksheet(sheetName) select a;
 
    foreach (var a in artistAlbums)
    {
        string artistInfo = "Artist Name: {0}; Album: {1}";
        Console.WriteLine(string.Format(artistInfo, a["Name"], a["Title"]));
    }
}

That's it. Initialize an ExcelQueryFactory object, and use Linq the way you are accustomed to.

In the previous example, we are accepting what LinqToExcel sends us by default, which is an ExcelQueryable<LinqToExcel.Row> object (which implements IQueryable and other useful interfaces). Each Row object is composed of (take a guess . . .) LinqToExcel.Cell objects.

Materialize Data into an Object Using LinqToExcel

But it gets better. LinqToExcel will also materialize our data to our objects, if the column names match:

LinqToExcel Query Materializing a Class
C#
public class ArtistAlbum
{
    public string Name { get; set; }
    public string Title { get; set; }
}
 
void PrintArtistAlbums()
{
    string pathToExcelFile = ""
    + @"D:\Code\Blog Projects\BlogSandbox\ArtistAlbums.xlsx";
 
    string sheetName = "Sheet1";
 
    var excelFile = new ExcelQueryFactory(pathToExcelFile);
    var artistAlbums = from a in excelFile.Worksheet<ArtistAlbum>(sheetName) select a; 
 
    foreach (var a in artistAlbums)
    {
        string artistInfo = "Artist Name: {0}; Album: {1}";
        Console.WriteLine(artistInfo, a.Name, a.Title);
    }
}

Note the items called out in red above. In this case, we have told LinqToExcel that the objects it expects to get back from the query should map to the ArtistAlbum class. If your column names match, you're all set – pull your data in and populate the class like you would with any other Linq query.

If your columns don't match, you still have some options. Let's say our boss gives us a spreadsheet each week to do something important within our application. He is also fond of crazy column names. We can take advantage of the AddMapping method provided by the ExcelQueryFactory object to map spreadsheet columns to the right property on our class.

Let's say that our ArtistAlbum class is defined as before, but the boss insists on naming the "Name" column on HIS spreadsheet as "Artist Name," and the "Title" column as "Album Title". In this case, we can do this:

Use the AddMapping Method to Map Column Names to Property Names
C#
void PrintArtistAlbums()
{
    string pathToExcelFile = ""
    + @"D:\Code\Blog Projects\BlogSandbox\ArtistAlbums.xlsx";
 
    string sheetName = "Sheet1";
 
    var excelFile = new ExcelQueryFactory(pathToExcelFile);
 
    // ADD COLUMN MAPPINGS:
    excelFile.AddMapping("Name", "Artist Name");
    excelFile.AddMapping("Title", "Album Title");

    var artistAlbums = from a in excelFile.Worksheet<ArtistAlbum>(sheetName) select a;

    foreach (var a in artistAlbums)
    {
        string artistInfo = "Artist Name: {0}; Album: {1}";
        Console.WriteLine(artistInfo, a.Name, a.Title);
    }
}

Use LinqToExcel to Query Workbook MetaData

We can use the GetWorksheetNames method to query Excel and return an IEnumerable<string> containing the names of all the Worksheets within a Workbook:

Retrieve the Worksheet Names Contained within an Excel Workbook
C#
public IEnumerable<string> getWorkSheets()
{
string pathToExcelFile = ""
    + @"D:\Code\Blog Projects\BlogSandbox\ArtistAlbums.xlsx";
    
    // This is an expensive operation - Do this only once if possible:
    var excelFile = new ExcelQueryFactory(pathToExcelFile);
    var worksheetsList = excelFile.GetWorksheetNames();

    return excelFile.GetWorksheetNames();
}

Query Column Names on an Excel Worksheet

Often, it's handy to be able to grab the column names available within a particular worksheet. That's also easy:

C#
var columnNames = _excel.GetColumnNames(selectedSheetName);

This returns a simple IEnumerable<string> containing the column names in the Worksheet.

Using The LinqToExcel GetColumnNames Method with Reflection – Example

This might be useful as a way to check and see if the properties exposed by one of our classes are all represented by columns in a spreadsheet. We can use reflection to walk the public properties exposed by a class instance, then compare the property names with our column names (case-insensitively, if we choose). This is a really basic example, but you get the idea.

Using the GetColumns Method to Validate Worksheet Columns Against Object Properties
C#
public IEnumerable<string> ValidateSheetColumns(string selectedSheetName)
{
    string pathToExcelFile = ""
    + @"D:\Code\Blog Projects\BlogSandbox\ArtistAlbums.xlsx";

    // This is an expensive operation - Do this only once if possible:
    var excelFile = new ExcelQueryFactory(pathToExcelFile);

    // Grab the column names from the Worksheet:
    var SheetColumnNames = excelFile.GetColumnNames(selectedSheetName);

    // Grab a list of the properties exposed by our class instance:
    var prototype = new ArtistAlbum();
    var ClassProperties = prototype.GetType().GetProperties();

    // Create an enumerable containing the property names:
    var requiredColumns = (from p in ClassProperties select p.Name);

    // Copy the names into our missing columns list:
    var missingColumns = requiredColumns.ToList();

    foreach (string column in requiredColumns)
    {
        foreach (var candidate in SheetColumnNames)
        {
            if (column.Equals(candidate, StringComparison.InvariantCultureIgnoreCase))
            {
                // Any items left represent properties which were
                // not found to match in the column names (case-insensitive):
                missingColumns.Remove(column);
            }
        }
    }
    return missingColumns;
}

Of course, the above only works well with simple POCO classes. If you have read-only properties on your class, you might run into problems. But for the sake of example, you get the point.

Note about Performance

Queries against Excel using LinqToExcel are nice and fast. However, it is important to remember that the initialization of an instance of ExcelQueryFactory is relatively expensive (you are essentially initiating a database connection from a cold start, then pulling a large amount of XML data into memory). This should be considered in designing your application's use of this library.

Where possible, I recommend doing your best to initialize the ExcelQueryFactory as few times as possible within the context of a given operation. My examples above include initialization in each example so that they can stand on their own. Most likely, though, in a real-world context, I would try to make the ExcelQueryFactory instance a class member variable where it made sense.

Update 10-4-2013 - Also, pay close attention to dates when consuming Excel data, and make sure the Excel 1900/1904 date funkiness is dealt with properly with whatever framework you are using! I haven't checked this with LinqToExcel yet, but this is always an important bugaboo to keep in mind when working with Excel data.

Wrapping Up

LinqToExcel is a cool enough tool that I decided to write this quick article. While I personally avoid working with Excel data wherever possible (Excel is for analysis of properly persisted data, dammit!), there is often no getting around it. While there are certainly other ways to work with Excel data in your application, LinqToExcel makes this about as simple as it can get.

Update: 10/15/2013 - As indicated above, use of this library requires that the ACE Database Engine be installed on the host machine, which makes LinqToExcel less than ideal for deployment to a server which does not have this .exe installed. While the ACE engine is re-distributable, is requires installation as opposed to inclusion as a dll. This requirement effectively tanked LinqToExcel for my current project, which is to be deployed as a Windows Azure Website. 

There is an alternative solution, ExcelDataReader, which takes no dependency on proprietary MS technology, and which can be used to read data from Excel files. It is not as smooth to use as LinqToExcel, nor as fully featured, but it gets the job done. I will take a look at that in an upcoming article.   

Additional Resources and Items of Interest

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)