Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Importing Excel XLS Files to Database Tables using Linq to SQL

4.75/5 (4 votes)
9 Dec 2013CPOL3 min read 25.2K  
This tip shows how to import XLS data to a SQL database using Link to SQL.

Introduction

Recently, I was given the task of designing a new database application. It would be an update to a previously existing application, only it would be done with newer technologies and having a newer look and feel. The previous database was never given to me; instead I received meta data from which I created a blank database. After numerous requests for the actual database, I was instead given a series of XLS files dumped from that original database, and all named appropriately. For example, "Invoice.xls" stored records for the "Invoice" table, etc. The column headers in the spreadsheets all matched the column names in my SQL database as well.  

I decided to write a generic method that could import all of these files for me. Other options would be to do something such as creating insert statements from the spreadsheets, however that proved very difficult. Some cases of a blank cell should be interpreted as null, while others should be an empty string. In the end, this turned out to be the easiest way to go.

Using the Code

Consider the sample class given below:

C#
using System;
using System.ComponentModel.DataAnnotations;
using System.Data.Linq.Mapping;
 
namespace MyClasses {
 
   [Table(Name = "Invoice")]
   public class InvoiceClass {
 
      [Column(CanBeNull = false, DbType = "int", 
      Name = "InvoiceId", IsPrimaryKey = true)]
      [Display(Name = "Invoice ID")]
      public string Id { get; set; }
   }
} 

Notice the attributes added. The Table attribute handles linking a specific class to a specific table. The Column attribute handles all column properties for a class property. Because I was using Linq to SQL in my application, this is how my classes were setup.
Please note: This is not a Linq to SQL article as those articles already exist. This is just an example of how to import spreadsheet data.

My application was simply a console app with all the code in the main method. I wasn't concerned with anything fancy or a GUI. In theory, this would run only one time.

The chain of operations is to:

  • Loop all defined class types (e.g. InvoiceClass)
  • Find the correct table name for the SQL database (e.g. "Invoice")
  • Check if records exist and clear if needed
  • Find the associated XLS filel (e.g. "Invoice.xls")
  • Load rows for the XLS file and loop them
  • Create a new instance of whatever class I'm working with
  • Loop all of the XLS row's columns and set the appropriate property (e.g. invoice.Id = xlsColumn["InvoiceID"])
  • Adjust all properties as needed for any possible null values, defaults, string truncation, etc.
  • Submit each resulting object instance to the Linq to SQL engine

The code is below. I have it commented fairly well, and rather than switch back and forth between code and article comments, I'm just letting the comments speak for themselves!

C#
static void Main(string[] args) {

   // The path to the XLS spreadsheet files to import.
   string xlsPath = @"C:\ThePathTo\TheXLSFiles";

   // The SQL database connection string.
   string sqlDatabase = "Data Source=server;Initial Catalog=db;
   User ID=me;Password=myPassword;Encrypt=True;TrustServerCertificate=True;";

   // The XLS database connection string
   string xlsFile = "Provider=Microsoft.ACE.OLEDB.12.0;
   Data Source={0};Extended Properties=\"Excel 8.0;HDR=YES\";";

   // The namespace that holds the classes we'll be working with.
   string classNamespace = "MyClasses";

   // All classes we're working with should be a part of the same namespace.
   var classList = Assembly.GetExecutingAssembly().GetTypes().Where(t =>
      String.Equals(t.Namespace, classNamespace, StringComparison.Ordinal)).ToList();

   // Grab all possible files we have.
   var dir = new System.IO.DirectoryInfo(xlsPath);

   // There's a separate Linq query to handle filtering.
   // The file query is currently for everything.
   var fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);

   // Processing only XLS files. To add XLSX, there should be an additional provider defined.
   var fileQuery = from file in fileList
                   where file.Extension == ".xls"
                   orderby file.Name
                   select file.FullName;
   var filesToProcess = fileQuery.ToList();

   // Start a connection to the database.
   using (var dbContext = new DataContext(sqlDatabase)) {

      // Skip interfaces. There is likely a better way,
      // but I don't have any actual classes starting with "I".
      var classQuery = (from tmpClass in classList
      where !tmpClass.Name.StartsWith("I") select tmpClass);

      // Loop each class and process/load data.
      foreach (var tmpClass in classQuery) {

         // Assuming Linq to SQL is setup,
         // this will be the correct SQL table to query for this class.
         var sqlTable = dbContext.GetTable(tmpClass);

         // How many records exist so far?
         var countQuery = (from object o in sqlTable select o);

         // Delete existing records?
         //if (countQuery.Any()) {
         //sqlTable.DeleteAllOnSubmit(countQuery);
         //dbContext.SubmitChanges();
         //}

         // Only process the table when no records existed yet?
         if (!countQuery.Any()) {

            // Need the actual string name of the table
            var attributes = tmpClass.GetCustomAttributes(typeof(TableAttribute), true);

            if (attributes.Any()) {

               var tableName = ((TableAttribute)attributes[0]).Name;

               // Find the specific XLS file for this table.
               var file = filesToProcess.FirstOrDefault(o =>
               Path.GetFileNameWithoutExtension(o).Equals(tableName,
               StringComparison.CurrentCultureIgnoreCase));

               // In case the file doesn't exist, don't error out.
               if (file == null) {
                  continue;
               }

               // This assumes all sheets are default named to
               // "Sheet 1". Additional code is needed
               // if these sheets are all named differently.
               using (var dataAdapter = new OleDbDataAdapter
               ("SELECT * FROM [Sheet 1$]", string.Format(xlsFile, file))) {

                  // Create the dataset and fill it
                  using (var myDataSet = new DataSet()) {

                     dataAdapter.Fill(myDataSet, tableName);

                     // The data table will have the same name
                     using (var dataTable = myDataSet.Tables[tableName]) {

                        // We need to create a new object of type tmpClass for each row and populate it.
                        foreach (DataRow row in dataTable.Rows) {

                           // Using Reflection to create this object and fill in properties.
                           var instance = Activator.CreateInstance(tmpClass);
                           var properties = tmpClass.GetProperties();

                           // Don't select properties we cannot set. These are computed fields typically.
                           var propertyQuery = (from property in properties
                              where property.CanWrite select property);

                           // Loop and set each property.
                           foreach (PropertyInfo property in propertyQuery) {

                              // Grab the Linq to Sql data attributes.
                              var dbProperty = property.GetCustomAttribute
                                  (typeof(ColumnAttribute), false) as ColumnAttribute;

                              // Make sure that this column exists in the data we received from the XLS spreadsheet
                              if (dataTable.Columns.Contains(dbProperty.Name)) {

                                 // Grab the value.  We need to account for DBNull first.
                                 var val = row[dbProperty.Name];
                                 if (val == DBNull.Value) {
                                    val = null;
                                 }

                                 // We need a bunch of special processing for null.  Empty cells are returned
                                 // instead of empty strings for example.
                                 if (val == null) {

                                    // DateTime should get processed specially.
                                    if ((property.PropertyType == typeof(DateTime)) ||
                                        (property.PropertyType == typeof(DateTime?))) {
                                       DateTime? nullableDate = null;
                                       property.SetValue(instance, nullableDate);
                                    }
                                    else if (!dbProperty.CanBeNull) {

                                       // If the value should not be null we need to create the default instance
                                       // of that class. (e.g. int = 0, etc.)  Strings do not have a constructor
                                       // that's usable this way so strings are a special check.
                                       if (property.PropertyType == typeof(string)) {
                                          property.SetValue(instance, string.Empty);
                                       }
                                       else {
                                          var tmpVal = Activator.CreateInstance(property.PropertyType).GetType();
                                          property.SetValue(instance, Activator.CreateInstance(tmpVal));
                                       }
                                    }
                                    else {
                                       // To here, we have a valid null value and it's not a DateTime.
                                       property.SetValue(instance, null);
                                    }
                                 }
                                 else if ((dbProperty.DbType.StartsWith("nvarchar") &&
                                          (!string.IsNullOrEmpty(val.ToString())))) {

                                    // This block of code assumes that the DbType is specified.  If it is,
                                    // we can account for string truncation here.
                                    var sLength = dbProperty.DbType.Substring(("nvarchar(").Length);
                                    sLength = sLength.Substring(0, sLength.Length - 1);
                                    var iLength = Int32.Parse(sLength);
                                    var newVal = val.ToString();
                                    newVal = newVal.Substring(0, Math.Min(iLength, newVal.Length));

                                    // We've truncated to here. If we are handling char type, a string
                                    // cannot be converted to char.  We need to handle this now. Only
                                    // handle for 1 length, otherwise we'll let the app throw an error.
                                    if ((property.PropertyType == typeof(char)) &&
                                        (newVal.Length == 1)) {
                                       property.SetValue(instance, newVal[0]);
                                    }
                                    else {
                                       // Set the truncated string
                                       property.SetValue(instance, newVal);
                                    }
                                 }
                                 else if (val.GetType() != property.PropertyType) {

                                    // To here, the resulting types are different somehow. We need to
                                    // do some conversions on the data.  Checking for DateTime.
                                    if ((val.GetType() == typeof(DateTime)) ||
                                        (val.GetType() == typeof(DateTime?))) {

                                       // nullable fields don't convert otherwise.
                                       DateTime? nullableDate = (DateTime)val;
                                       property.SetValue(instance, nullableDate);
                                    }
                                    else if ((property.PropertyType == typeof(DateTime)) ||
                                             (property.PropertyType == typeof(DateTime?))) {

                                       // A number of times the record comes back as a string instead.
                                       var newVal = val.ToString();
                                       var nullableDate = (string.IsNullOrWhiteSpace
                                          (newVal) ? (DateTime?)null : DateTime.Parse(newVal));
                                       property.SetValue(instance, nullableDate);
                                    }
                                    else {
                                       // To here we have a different type and need to convert. It's not
                                       // a DateTime, and it's not a null value which was handled already.
                                       var pType = property.PropertyType;

                                       // We can't take "Int? 3" and
                                       // put it into "Int" field. Must convert.
                                       if ((property.PropertyType.IsGenericType) &&
                                           (property.PropertyType.GetGenericTypeDefinition().
                                              Equals(typeof(Nullable<>)))) {
                                          pType = Nullable.GetUnderlyingType(property.PropertyType);
                                       }

                                       // Finally change the type and set the value.
                                       var newProp = Convert.ChangeType(val, pType);
                                       property.SetValue(instance, newProp);
                                    }
                                 }
                                 else {
                                    // To here the types match and the value isn't null
                                    property.SetValue(instance, val);
                                 }

                              } // dbColumn exists

                           } // property loop

                           // This instance can be inserted if needed.
                           sqlTable.InsertOnSubmit(instance);

                        } // DataRow loop

                        // Submit changes.
                        dbContext.SubmitChanges();

                     } // using DataTable

                  } // using DataSet

               } // Using DataAdapter

            } // Attributes exist

         } // No records were preexisting in the database table.

      } // class loop

   } // using DataContext

} // main method

The only thing that needs to be done with the above code is to set the initial values of the strings at the top. A few assumptions are made such as XLS sheets being named "Sheet 1" etc.

Points of Interest

Junk I learned: Reflection is an amazingly powerful piece of C#. Strings have no default constructor because they are immutable. Linq-to-SQL is pretty awesome. I hope at least one person finds this helpful as I was unable to locate something like this when I began this task.

History

  • Initial version: 12/8/2013

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)