Introduction
Last week at my work, I had to write a tool to process few hundreds of thousand email files and insert their details to an MS SQL Server 2005 database. Doing it the traditional way might require much connectivity to the database and will consume many resources. I was wondering what if I could insert all that data at once to the database, which will be much more efficient. I started Googling and I found OPENXML. This is not a whole new technology, but I didn’t know about this before (strange!!).
After a few more minutes of research, I found out how to perform bulk operations against a databases using OPENXML and Stored Procedures.
What is OPENXML
OPENXML is a function added to SQL Server 2000 that provides a row set view over an XML document. Since a row set is simply a set of rows that contain columns of data, OPENXML is the function that allows an XML document to be treated in a familiar relational database format. It allows for the passing of an XML document to a T-SQL Stored Procedure for updating the data.
OPENXML- Summary
- It extends the SQL Language
- It is used within T-SQL Stored Procedures
- XML Document passed as a parameter
- It uses row and column selectors utilizing XPath
It supports the following:
- Attribute and element-centric mappings.
- Edge table row set.
- XML annotation/overflow column.
- Hierarchy support.
OPENXML and ADO.NET
This code block explains the usage of OPENXML to insert bulk data to a database. It simulates the actual process I want (processing .eml files) by reading files in a folder and saving their names, created date, and size in bytes.
- First create the table below:
CREATE TABLE FileDetails(
FileName varchar(50) PRIMARY KEY,
CreatedDate varchar(50) ,
Size decimal(18, 0)
- Now create the Stored Procedure as below:
CREATE PROC sp_bulkinsert @xmldata varchar(max)
AS
DECLARE @hDoc int
exec sp_xml_preparedocument @hDoc OUTPUT,@xmldata
Insert Into FileDetails
SELECT FileName, CreatedDate, Size
FROM OPENXML (@hdoc, '/ds/fileDetails',2)
WITH (FileName varchar(50), CreatedDate Datetime, Size decimal) XMLFileDetails
Where XMLFileDetails.FileName Not IN (Select FileName from FileDetails)
EXEC sp_xml_removedocument @hDoc
GO
- The sample code snippet below shows how to call this Stored Procedure to insert bulk data to a database at once.
We can retrieve the file details using FileInfo()
objects and create a DataSet
of the file details, and then we can convert this DataSet
to XML using the WriteXml()
method of the DataSet
.
private static string preparexml(DataSet ds)
{
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
ds.WriteXml(sw, XmlWriteMode.WriteSchema);
return sb.ToString();
}
This XML representation of the data can be passed as the parameter to the Stored Procedure, as given below:
private static void bulkInsert(string xml)
{
try{
SqlConnection con = creatCon();
con.Open();
SqlCommand cmd = new SqlCommand("sp_bulkinsert", con);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@xmldata",SqlDbType.VarChar));
cmd.Parameters["@xmldata"].Value=xml;
cmd.ExecuteNonQuery();
Console.WriteLine("Completed inserting file informations to database");
con.Close();
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
}
That’s all simple isn’t it? You can see the performance of this method by writing another code to insert the same fields using the traditional one at a time way. Which surprisingly will show a huge difference in performance and time.
You can use the same way to update and delete records in bulk.
The latest complete code can be downloaded and tested here: complete source code.