Introduction
The project I�m working on now requires a way to drop a significant amount of data into 20+ tables. I went looking for Bulk Insert solutions and what I found was these articles in the MSDN.
The article explains how to prepare a table as XML using its DataSet
�s schema, then sending it to a stored procedure to execute an Update
and Insert
. The problem I saw for my needs was that the MSDN solution requires a function and a stored procedure for each table.
So I spent a day and rolled out this code here. It takes a DataSet
, an open SQL connection and a table name and writes the command text to execute an OPENXML
Bulk Insert.
The Code
There is one calling function and two support functions. The first function takes three parameters:
- a
DataSet
,
- an open SQL connection,
- and a table name.
and begins by processing the table�s ColumnMapping
, just like MSDN�s tutorial. It also streams the DataSet
to a StringBuilder
as XML. Then, instead of sending the XML to a stored procedure, it sends it to buildBulkUpdateSql
which creates the remaining T-SQL script.
public static void BulkTableInsert(DataSet objDS,
SqlConnection objCon, string tablename)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder( 1000);
System.IO.StringWriter sw = new System.IO.StringWriter(sb);
foreach( DataColumn col in objDS.Tables[tablename].Columns)
{
col.ColumnMapping = System.Data.MappingType.Attribute;
}
objDS.WriteXml(sw, System.Data.XmlWriteMode.WriteSchema);
string sqlText = buildBulkUpdateSql(sb.ToString(), objDS.Tables[tablename]);
execSql(objCon, sqlText);
}
This is where the generic T-SQL text is created. The only magic here is getting the C# escape characters out of the string before sending it to the SqlCommand
. Another thing to note is how I�m using the database�s table as the schema to work within the WITH
argument so that I don�t have to name each column and DataType.
static string buildBulkUpdateSql( string dataXml, DataTable table)
{
System.Text.StringBuilder sb = new System.Text.StringBuilder();
dataXml = dataXml.Replace(Environment.NewLine, "");
dataXml = dataXml.Replace("\"", "''");
sb.Append(" SET NOCOUNT ON");
sb.Append(" DECLARE @hDoc INT");
sb.AppendFormat(" EXEC sp_xml_preparedocument @hDoc OUTPUT, '{0}'", dataXml);
sb.AppendFormat(" DELETE {0} FROM {0} INNER JOIN ", table.TableName);
sb.AppendFormat(" (SELECT * FROM OPENXML (@hdoc, '/NewDataSet/{0}', 1)",
table.TableName);
sb.AppendFormat(" WITH {0}) xmltable ON 1 = 1", table.TableName);
foreach( DataColumn col in table.PrimaryKey)
{
sb.AppendFormat(" AND {0}.{1} = xmltable.{1}", table.TableName,
col.ColumnName);
}
sb.AppendFormat(" INSERT INTO {0} SELECT *", table.TableName);
sb.AppendFormat(" FROM OPENXML (@hdoc, '/NewDataSet/{0}', 1) WITH {0}",
table.TableName);
sb.Append(" EXEC sp_xml_removedocument @hDoc");
return sb.ToString();
}
There�s no magic here. This is just a simple command executer. In my actual app I don�t use this and I don�t expect you to use this as well, but for code-completion, here it is:
static void execSql(SqlConnection objCon, string sqlText)
{
SqlCommand objCom = new SqlCommand();
objCom.Connection = objCon;
objCom.CommandType = CommandType.Text;
objCom.CommandText = sqlText;
objCom.ExecuteNonQuery();
}
Drawbacks
As the documentation in the function header says, this procedure assumes a few things:
- The
DataTable
must have at least one primary key so that it knows what is update data and what is not.
- The existing data matched by the primary key will be deleted, and then re-inserted. This can be debated as to whether it is a good idea or not. I believe it would speed up the transaction as a whole, but perhaps it is not the most elegant solution. I�m interested in any DBA comments. Doing the update part is easy, but then you would have to do an
outer join
against the XML table for the Insert
data; which could take a while depending on the table size.
- The
DataTable
column names must match the database�s table perfectly. This is kind of a bummer, but if you�re auto-magically creating DataSet
s in the IDE, it shouldn�t matter to you.
TODO
Rip out unnecessary XML before creating the Insert
script. DataSet
s can contain multiple tables, all of which are written out with the DataSet.WriteXml()
function. In large samples, this is too much data to be send across and is completely useless.
Conclusion
I�ve tested the heck out of this using small and medium size tables and DataSet
s. There is tons of room for improvement and feature enhancement and I know this won�t work in many professional environments, but it�s a good start for me that I wish I had two days ago. Like I mentioned, I'm looking forward to hearing from the community about speeding up this snippet a bit.
Links