Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Handling BULK Data insert from CSV to SQL Server

4.69/5 (17 votes)
9 Nov 2012CPOL5 min read 391.8K  
From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

Introduction

I am writing this article from the experiences of inserting huge data (around 5 million rows, 400 MB) from a CSV file to a SQL Server database.

Background 

There are several ways to insert bulk data from a CSV file to a database; our goal was to perform faster insertion and execute the insertion from a C# application. To reach the goal, we experimented with some of the common well known techniques to handle bulk data insertion. Following are the techniques we experimented:

  1. SQL BULK INSERT query
  2. BCP or SqlBulkCopy library to insert bulk data using C# or VB
  3. SQl Server Integration Service (SSIS)
  4. Normal SQL command library in C# or VB

From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

1. SQL BULK Insert

Using the BULK INSERT statement we can insert bulk data into the database directly from a CSV file. The simplest version of the BULK INSERT query looks like that:

SQL
BULK INSERT dbo.TableForBulkData
FROM 'C:\BulkDataFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)

To handle transaction and rollback, a try catch block can be used like that:

SQL
EGIN TRANSACTION
BEGIN TRY
BULK INSERT dbo.BulkDataTable
FROM 'C:\TestFiles\Bulk3.csv'
WITH
(
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    ROWS_PER_BATCH = 10000, 
    TABLOCK
)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH

The details of the BULK INSERT SQL are available here: http://msdn.microsoft.com/en-us/library/ms188365.aspx.

BULK INSERT is actually a rich T-SQL command which takes arguments to setup error, batch_size, trigger, and so on.

2. BCP or SqlBulkCopy Library

The .NET Framework contains a SqlBulkCopy class in the System.Data.SqlClient namespace to copy large amounts of data from .NET applications to a SQL Server database easily and efficiently.

The details of the SqlBulkCopy operation can be found here: http://msdn.microsoft.com/en-us/library/tchktcdk%28v=vs.80%29.aspx.

The SqlBulkCopy class copies a bulk of data from a data table to a database table. SqlBulkCopy takes the following types of parameters to copy data in the database: System.Data.DataRow[], System.Data.DataTable, System.Data.IDataReader.  

Now the challenge is to convert the large CSV file to any of these datatypes: DataRow[], DataTable, IDataReader. Some of the open source libraries are available to perform such conversion. For experimenting, we used CSVReader (http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader) which binds data from a CSV file via the System.Data.IDataReader interface.

The following code performs a SqlBulkCopy perfectly from a CSV to a database table.

C#
StreamReader file = new StreamReader(bulk_data_filename);
CsvReader csv = new CsvReader(file, true,',');
SqlBulkCopy copy = new SqlBulkCopy(conn);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);

By default, a bulk copy operation runs in its own transaction. To commit or rollback, a transaction needs to be included. Following is the code for SqlBulkCopy with transaction. 

C#
SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
SqlTransaction transaction = conn.BeginTransaction();
try
{
    using (StreamReader file = new StreamReader(filename))
    {
        CsvReader csv = new CsvReader(file, true, '|');
        SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
        copy.DestinationTableName = tablename;
        copy.WriteToServer(csv);
        transaction.Commit();
    }
}
catch (Exception ex)
{
    transaction.Rollback();
}
finally
{
    conn.Close();
}

It is better to add BulkCopyTimeout for very large CSV files: copy.BulkCopyTimeout = XXX;.

3. SQL Server Integration Service

SQL Server includes a powerful data integration and transformation application called SQL Server Integration Service (SSIS). One of the main functions of SSIS is to move data from almost any formatted external data source into SQL Server. Using Business Intelligent Development Studio (BIDS) we can easily import data from a CSV file to a database. Also, it is very simple to put a package file in as automatic reoccurring job.

Here are the basic steps to create a SSIS service package to import data from a CSV file to SQL Server.

  1. Open SQL Server Business Intelligence Studio.
  2. Create a new “Integration Service Project”. 
  3. In the “Control Flow” tab, drag a “Data Flow Task” from the toolbox.
  4. Go to “Data Flow” tab.
  5. In the “Data Flow” page, drag “Flat File Source” and “ADO.NET Destination” from the toolbox and set them up.
  6. Connect Flat File Source output path (green arrow) to the ADO.NET Destination.

Running the created SSIS package will duplicate data from the CSV file to the SQL database. 

4. Insert data using the conventional SQLCommand class

Data can be inserted to the database from a CSV file using the conventional SQLCommand class. But this is a very slow process. Compared to the other three ways I have already discussed, this process is at least 10 times slower. It is strongly recommended to not loop through the CSV file row by row and execute SqlCommand for every row to insert a bulk amount of date from the CSV file to the SQL Server database.

Comparative analysis

In our study, we found that BULK Insert SQL and SQLBulkCopy performed best. For around a 400MB CSV file it took an average three minutes to insert data. SQL Server Integration Service (SSIS) took around double the time than BULK Insert and SQLBulkCopy. Normal data insertion took a long long time.

Based on our results we can say using BULK Insert SQL or the SQLBulkCopy class performs best for huge data insertions. We can also use SSIS for bulk data insertions, but using normal SQLCommand to insert bulk data is not a reasonable solution. 

Caution: The result can vary from system to system.

Expert Opinions (References to make it better)

According to Adrian Hills: Bulk loading though ADO.NET can be extremely efficient if you use the SqlBulkCopy class. If your source data is in XML, you could try loading that into a DataSet - either by iterating through the XML document manually, or via the DataSet.ReadXml method. You can then highly optimise the bulk loading when using the SqlBulkCopy class by:

  • loading into a heap table (no indexes - create the indexes after the data has been loaded in)
  • Specifying the TableLock option for the bulk load, which will get a bulk upload lock on the table

Suggestion in forum: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/812b9a6a-541d-4d3a-b7a0-005b93012264.

Blog about it: http://www.adathedev.co.uk/2010/02/sqlbulkcopy-bulk-load-to-sql-server.html

Greg Robidoux has written tips for Minimally Logging Bulk Load Inserts into SQL Server in http://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/.

Optimizing Bulk Import Performance

Following are the suggestions to optimize bulk import performance:

  • Use minimal logging
  • Import data in parallel from multiple clients to a single table
  • Disable triggers 
  • Disable constraints
  • Order the data in a data file
  • Control the locking behavior
  • Import data in native format

Details of these optimization suggestions can be found here: http://msdn.microsoft.com/en-us/library/ms190421%28v=sql.105%29.aspx.

History

V01 - August 13, 2012 - First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)