Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Entity Framework 5.0 - Performance Tuning Using SqlBulkCopy

0.00/5 (No votes)
22 Jul 2016 1  
Performance tuning using SqlBulkCopy

Introduction

There are many ways of inserting huge amount of data (a million INSERTs) into the database using Entity Framework. You can also buy Entity Framework extension available in the market. However, the fastest way that I have used in one of my recent projects is the SqlBulkCopy. This will dramatically improve the Entity Framework Performance. All you need to have is the List<Entities> that you want to insert.

Background

Entity Framework uses SaveChanges() method to write the data back to the backend. There are many factors that you should consider before calling this method, like you want autoupdate ON/OFF, etc.

Calling SaveChanges() for each record slows bulk inserts extremely down. You can try few simple tests which will very likely improve the performance:

  • Call SaveChanges() once after ALL records.
  • Call SaveChanges() after for example 100 records.
  • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
  • Disable change detection

For the performance, it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entities, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

To avoid the batch operation, you can use SqlBulkCopy on your Object Context. In my example, I am using this method on List<Entities> generated using EDMX feature. I have used Entity Framework 5.0 Database-First approach. For Entity Framework 6.0 or later, you may have to tweak this code a bit to avoid mappings.

Using the Code

In this section, we will provide you all the functions needed to bulk insert any entities (model objects). Root function, the only one called from the outside, is BulkInsertAll and it receives a List of your entity class. Here is the source code.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.EntityClient;
using System.Data.Metadata.Edm;
using System.Data.Objects;
using System.Data.SqlClient;
using System.Linq;
using System.Reflection;

namespace Abhay
{
    partial class MyEntityContext
    {
        public void BulkInsertAll<T>(IEnumerable<T> entities)
        {
            try
            {
                entities = entities.ToArray();
                string cs = Connection.ConnectionString;
                var conn = new SqlConnection(cs);
                conn.Open();

                Type t = typeof(T);

                var workspace = this.MetadataWorkspace;
                
                Dictionary<string, string> mappings = null;

                try
                {
                    mappings = GetMappings(workspace, this.DefaultContainerName, typeof(T).Name);
                }
                catch (Exception)
                {
                    // there may be an exception depend case to case. 
                    // In my case I found some issue initially when my 
                    // EDMX was corrupted but now this is working fine. 
                    // This is just to see the exception
                }

                // var tableAttribute = (TableAttribute)t.GetCustomAttributes(
                // typeof(TableAttribute), false).Single();
                var bulkCopy = new SqlBulkCopy(conn)
                {
                    DestinationTableName = t.Name
                };

                var properties = t.GetProperties().Where(EventTypeFilter).ToArray();

                var table = new DataTable();

                foreach (var property in properties)
                {
                    try
                    {
                        Type propertyType = property.PropertyType;
                        if (propertyType.IsGenericType &&
                            propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                        {
                            propertyType = Nullable.GetUnderlyingType(propertyType);
                        }

                        table.Columns.Add(new DataColumn(property.Name, propertyType));

                        var clrPropertyName = property.Name;

                        // In case mapping failed to compute, assuming the mapping between Entity and 
                        // Database Table is same as Generated Entity
                        var tableColumnName = mappings == null ? property.Name : mappings[property.Name];

                        bulkCopy.ColumnMappings.Add
                            (new SqlBulkCopyColumnMapping(clrPropertyName, tableColumnName));
                    }
                    catch (Exception)
                    {
                        //This is just to see the exception
                    }
                }

                foreach (var entity in entities)
                {
                    table.Rows.Add(properties.Select(
                      property => GetPropertyValue(
                      property.GetValue(entity, null))).ToArray());
                }

                bulkCopy.WriteToServer(table);
                conn.Close();
            }
            catch (Exception)
            {
                throw;
            }
        }

        private bool EventTypeFilter(System.Reflection.PropertyInfo p)
        {
            //var attribute = Attribute.GetCustomAttribute(p,
            //    typeof(AssociationAttribute)) as AssociationAttribute;

            //if (attribute == null) return true;
            //if (attribute.IsForeignKey == false) return true;

            //In this case I want to avoid the Reference Type (Custom Class) other than System.String
            //You can comment the blow case and use the above code

            if (p.PropertyType.IsClass && !p.PropertyType.Namespace.ToString().Equals("System"))
                return false;
            return true;
        }

        private object GetPropertyValue(object o)
        {
            if (o == null)
                return DBNull.Value;
            return o;
        }

        private Dictionary<string, string> GetMappings
             (MetadataWorkspace workspace, string containerName, string entityName)
        {
            var mappings = new Dictionary<string, string>();
            var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
            dynamic entitySetMaps = storageMapping.GetType().InvokeMember(
                "EntitySetMaps",
                BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance,
                null, storageMapping, null);

            foreach (var entitySetMap in entitySetMaps)
            {
                var typeMappings = GetArrayList("TypeMappings", entitySetMap);
                dynamic typeMapping = typeMappings[0];
                dynamic types = GetArrayList("Types", typeMapping);

                if (types[0].Name == entityName)
                {
                    var fragments = GetArrayList("MappingFragments", typeMapping);
                    var fragment = fragments[0];
                    var properties = GetArrayList("AllProperties", fragment);
                    foreach (var property in properties)
                    {
                        var edmProperty = GetProperty("EdmProperty", property);
                        var columnProperty = GetProperty("ColumnProperty", property);
                        mappings.Add(edmProperty.Name, columnProperty.Name);
                    }
                }
            }

            return mappings;
        }

        private dynamic GetProperty(string property, object instance)
        {
            var type = instance.GetType();
            return type.InvokeMember(property, BindingFlags.GetProperty | 
            BindingFlags.NonPublic | BindingFlags.Instance, null, instance, null);
        }

        private ArrayList GetArrayList(string property, object instance)
        {
            var type = instance.GetType();
            var objects = (IEnumerable)type.InvokeMember(property, 
            BindingFlags.GetProperty | BindingFlags.NonPublic | 
            BindingFlags.Instance, null, instance, null);
            var list = new ArrayList();
            foreach (var o in objects)
            {
                list.Add(o);
            }
            return list;
        }
    }
}

You can simply have the same name as your Object Context, replace it with MyEntityContext and place this partial class on the same root folder as your Object Context class.

Call BulkInsertAll from your Data Services, or Business Services as below.

MyEntityContext.BulkInsertAll(entities); // where entities is the List<Entity>

Points of Interest

Most of the performance issues that Entity Framework suffers comes from too frequent communication with the database server. Each record inserted requires two statements to be issued to the server, with two roundtrips to the server to fully execute. All the measurements presented in this section are based on SQL Server installed on local computer, with nearly zero roundtrip time. Consider database server installed on LAN or Shared with many other application databases, which is typical situation, with only one millisecond roundtrip time. Inserting one million objects from our experiment means three million records, with six million roundtrips to the database. With one millisecond irrevocably lost with each database record, we are already 100 minutes behind the schedule. At the same time, bulk implementation won't spend more than ten milliseconds for the same thing, because it sends less than a dozen of statements to the server. All the data actually transferred by the bulk implementation are actual data that will be saved into database records - communication overhead nears zero on such a large amount of useful data.

History

  • 23rd July, 2016: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here