Introduction
I am writing this article from the experiences of inserting huge data (around 5 million rows, 400
MB) from a CSV file to a SQL Server database.
Background
There are several ways to insert bulk data from a CSV file to a database; our goal was to perform faster insertion and execute the insertion from
a C# application.
To reach the goal, we experimented with some of the common well known techniques to handle bulk data insertion. Following are the techniques we experimented:
- SQL
BULK INSERT
query - BCP or SqlBulkCopy library to insert bulk data using C# or VB
- SQl Server Integration Service (SSIS)
- Normal SQL command library in C# or VB
From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.
1. SQL BULK Insert
Using the BULK INSERT
statement we can insert bulk data into the database directly from
a CSV file. The simplest version of the BULK INSERT
query looks like that:
BULK INSERT dbo.TableForBulkData
FROM 'C:\BulkDataFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
To handle transaction and rollback, a try catch block can be used like that:
EGIN TRANSACTION
BEGIN TRY
BULK INSERT dbo.BulkDataTable
FROM 'C:\TestFiles\Bulk3.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
ROWS_PER_BATCH = 10000,
TABLOCK
)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH
The details of the BULK INSERT
SQL are available here:
http://msdn.microsoft.com/en-us/library/ms188365.aspx.
BULK INSERT
is actually a rich T-SQL command which takes
arguments to setup error, batch_size, trigger, and so on.
2. BCP or SqlBulkCopy Library
The .NET Framework contains a SqlBulkCopy
class in the System.Data.SqlClient
namespace to copy large amounts of data from .NET applications to a SQL Server database easily and
efficiently.
The details of the SqlBulkCopy
operation can be found here: http://msdn.microsoft.com/en-us/library/tchktcdk%28v=vs.80%29.aspx.
The SqlBulkCopy
class copies a bulk of data from a data table to
a database table. SqlBulkCopy
takes the following types of parameters to copy
data in the database: System.Data.DataRow[]
, System.Data.DataTable
,
System.Data.IDataReader
.
Now the challenge is to convert the large CSV file to any of these datatypes:
DataRow[]
, DataTable
, IDataReader
. Some of the open source libraries are available
to perform such conversion. For experimenting, we used CSVReader
(http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader)
which binds data from a CSV file via the System.Data.IDataReader
interface.
The following code performs a SqlBulkCopy
perfectly from a CSV to a database table.
StreamReader file = new StreamReader(bulk_data_filename);
CsvReader csv = new CsvReader(file, true,',');
SqlBulkCopy copy = new SqlBulkCopy(conn);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);
By default, a bulk copy operation runs in its own transaction. To commit or rollback,
a transaction needs to be included. Following is the code for SqlBulkCopy
with transaction.
SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
SqlTransaction transaction = conn.BeginTransaction();
try
{
using (StreamReader file = new StreamReader(filename))
{
CsvReader csv = new CsvReader(file, true, '|');
SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);
transaction.Commit();
}
}
catch (Exception ex)
{
transaction.Rollback();
}
finally
{
conn.Close();
}
It is better to add BulkCopyTimeout
for very large CSV files: copy.BulkCopyTimeout = XXX;
.
3. SQL Server Integration Service
SQL Server includes a powerful data integration and transformation application called SQL Server Integration Service (SSIS). One of the main functions of SSIS is to move
data from almost any formatted external data source into SQL Server. Using Business Intelligent Development Studio (BIDS) we can easily import data from
a CSV file to a database.
Also, it is very simple to put a package file in as automatic reoccurring job.
Here are the basic steps to create a SSIS service package to import data from
a CSV file to SQL Server.
- Open SQL Server Business Intelligence Studio.
- Create a new “Integration Service Project”.
- In the “Control Flow” tab, drag a “Data Flow Task” from the toolbox.
- Go to “Data Flow” tab.
- In the “Data Flow” page, drag “Flat File Source” and “ADO.NET Destination” from
the toolbox and set them up.
- Connect Flat File Source output path (green arrow) to the ADO.NET Destination.
Running the created SSIS package will duplicate data from the CSV file to the SQL
database.
4. Insert data using the conventional SQLCommand class
Data can be inserted to the database from a CSV file using the conventional
SQLCommand
class. But this is a very slow process. Compared to the other three
ways I have already discussed, this process is at least 10 times slower. It is strongly recommended to not loop through the CSV file row by row and execute
SqlCommand
for every row to insert a bulk amount of date from the CSV file to
the SQL Server database.
Comparative analysis
In our study, we found that BULK Insert
SQL and SQLBulkCopy
performed best. For around
a 400MB CSV file it took an average three minutes to insert data.
SQL Server Integration Service (SSIS) took around double the time than BULK Insert
and
SQLBulkCopy
. Normal data insertion took a long long time.
Based on our results we can say using BULK Insert
SQL or the
SQLBulkCopy
class performs best for huge data insertions. We can also use SSIS for
bulk data insertions, but using normal SQLCommand to insert bulk data is not a reasonable solution.
Caution: The result can vary from system to system.
Expert Opinions (References to make it better)
According to Adrian Hills: Bulk loading though ADO.NET can be extremely efficient if you use the
SqlBulkCopy
class. If your source data is in XML,
you could try loading that into a DataSet
- either by iterating through the XML document manually, or via the
DataSet.ReadXml
method.
You can then highly optimise the bulk loading when using the SqlBulkCopy
class by:
- loading into a heap table (no indexes - create the indexes after the data has been loaded in)
- Specifying the
TableLock
option for the bulk load, which will get a bulk upload lock on the table
Suggestion in forum: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/812b9a6a-541d-4d3a-b7a0-005b93012264.
Blog about it: http://www.adathedev.co.uk/2010/02/sqlbulkcopy-bulk-load-to-sql-server.html
Greg Robidoux has written tips for Minimally Logging Bulk Load Inserts into SQL Server
in http://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/.
Optimizing Bulk Import Performance
Following are the suggestions to optimize bulk import performance:
- Use minimal logging
- Import data in parallel from multiple clients to a single table
- Disable triggers
- Disable constraints
- Order the data in a data file
- Control the locking behavior
- Import data in native format
Details of these optimization suggestions can be found here: http://msdn.microsoft.com/en-us/library/ms190421%28v=sql.105%29.aspx.
History
V01 - August 13, 2012 - First version.