Introduction
This article explores the power of the DataSet
in a way that you might not be familiar with. Two important (and very helpful) aspects of the DataSet
are uncovered here:
- A
DataSet
can be used to write to Excel and read from Excel using an OleDbConnection
, very easily. (Note: I use these functions in my data-driven unit tests.) - A
DataSet
(or XML document), in conjunction with an agnostic XML schema (not a DataSet
XML schema) can be used to shape/reshape XML data.
Background
I remember the DAO (Data Access Objects) days and the ADO Recordset
days. I also remember being skeptical when Microsoft introduced yet another library to manipulate data. That was when .NET initially came out. My skepticism waned away very quickly when I started to delve under the covers of the DataSet
. Microsoft introduced a very powerful tool with the DataSet
(even if it isn't sold the way I'm going to sell it to you).
Going back a few years ago, I had designed/developed an Enterprise Management Reporting system for a global company that required MSMQ driven server-side automation of Office. That was an unadvisable task, according to Microsoft - but it was achieved nonetheless (with macros and the like).
Then, I discovered the beauty of writing to and reading from Excel using an
OleDbConnection
. This meant I could limit the execution time to
just the time required to running macros for the manipulation of data (I didn't need Excel to write to Excel anymore). That greatly improved the efficiency, and reduced the risk of problems because the Excel instance was used for a much shorter time.
I also remember a time when I had to use an XmlWriter
or the FOR XML
clause in SQL Server to build XML data. Using the Relations
collection of ADO.NET to build formatted XML? - also a thing of the past. But, those were my younger days, I rejoiced when I found that I could read XML schemas into a DataSet
and out came the formatted XML!
And, if that wasn't enough to pique your interest, how about the ability (in the attached code) to take an XML document in one format and have it loaded into a DataSet
with XML schema in another format? Who would have thought? A DataSet
as a Transformation Engine - you're kidding right? (I'm not!)
Now, you might say that .NET already provides the ability to shape XML with Strongly-Typed DataSets. But, the problem is that you don't necessarily want .NET DataSet
XML schemas. I have been on a few projects where the tool used to develop XML schema was not VS2005. Admit it, most times, you need to use agnostic XML schemas. The development team isn't necessarily responsible for XML schemas. They don't own the XML schemas. After all, best practice requires us to separate the presentation of the data from the retrieval of data. So, how can we implement a best practice here?
By relaxing the constraints, we can use the XML schema to build the data, then use the XML schema to validate the data. Kill two birds with one stone - so to speak. Enough of the sales pitch - let's take a look at what I am talking about.
Using the code
There are two main modules in this project.
Name | Description |
DataSetMethods | This class has three methods:
gatherMetaData - used to gather table/column information and relations within the DataSet .setupDataSet - takes an agnostic XML schema and makes it DataSet -ready.LoadXmlData - used to load XML into a DataSet with a target XML schema already applied to it.
|
ExcelFunctions | This class contains two overloads to ReadFromExcel to a DataSet and two overloads to WriteToExcel from a DataSet . |
DataSetMethods
code:
public static DataSet gatherMetaData(DataSet ds)
{
XmlDocument xd = new XmlDocument();
xd.Load(@"Metadata Xsd\MetaData.xsd");
DataSet metaData = setupDataSet(xd, new DataSet(),false);
for (int i = 0; i < ds.Tables.Count; i++)
{
for (int j = 0; j < ds.Tables[i].Columns.Count; j++)
{
DataRow dr = metaData.Tables["Table"].NewRow();
dr["Name"] = ds.Tables[i].TableName;
dr["Column"] = ds.Tables[i].Columns[j].ColumnName;
metaData.Tables["Table"].Rows.Add(dr);
}
}
for (int i = 0; i < ds.Relations.Count; i++)
{
for (int j = 0; j < ds.Relations[i].ParentColumns.Length; j++)
{
for (int k = 0; k < ds.Relations[i].ChildColumns.Length; k++)
{
DataRow dr = metaData.Tables["Relations"].NewRow();
dr["ParentTable"] = ds.Relations[i].ParentTable;
dr["ParentColumn"] = ds.Relations[i].ParentColumns[j].ColumnName;
dr["ChildTable"] = ds.Relations[i].ChildTable;
dr["ChildColumn"] = ds.Relations[i].ChildColumns[j].ColumnName;
metaData.Tables["Relations"].Rows.Add(dr);
}
}
}
return metaData;
}
public static DataSet setupDataSet(XmlDocument xmlSchema, DataSet ds, bool isDataSet)
{
XmlAttribute msdataAttribute = null;
XmlNode schemaNode = null;
StringReader sr = null;
XmlReader xr = null;
try
{
if (isDataSet)
{
msdataAttribute = xmlSchema.CreateAttribute("xmlns:msdata",
"http://www.w3.org/2000/xmlns/");
msdataAttribute.Value = "urn:schemas-microsoft-com:xml-msdata";
schemaNode = xmlSchema.SelectSingleNode(
"descendant::*[local-name() = 'schema']");
schemaNode.Attributes.Append(msdataAttribute);
msdataAttribute = xmlSchema.CreateAttribute("msdata:IsDataSet",
"urn:schemas-microsoft-com:xml-msdata");
msdataAttribute.Value = "true";
schemaNode.SelectSingleNode("*[local-name() = 'element']"
).Attributes.Append(msdataAttribute);
}
sr = new StringReader(xmlSchema.OuterXml);
xr = XmlReader.Create(sr);
ds.ReadXmlSchema(xr);
ds.EnforceConstraints = false;
}
finally
{
msdataAttribute = null;
schemaNode = null;
sr = null;
xr = null;
}
return ds;
}
public static DataSet LoadXmlData(XmlDocument xmlData, DataSet ds)
{
DataSet tempds = null;
StringReader sr = null;
XmlReader xmlDataReader = null;
try
{
tempds = new DataSet();
sr = new StringReader(xmlData.OuterXml);
xmlDataReader = XmlReader.Create(sr);
tempds.ReadXml(xmlDataReader);
ds.Merge(tempds, false, MissingSchemaAction.Ignore);
}
finally
{
tempds = null;
sr = null;
xmlDataReader = null;
}
return ds;
}
ExcelFunctions
code:
#region Constants
private const string _excelConnectionString =
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";User ID=" +
"Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";
private const string _excelSelect = "select * from [{0}]";
private const string _tableName = "TABLE_NAME";
private const string _tableCreate = "CREATE TABLE [{0}] (";
private const string _tableColumn = "[{0}] {1}{2}";
#endregion
#region Private Methods
private static string getColumnType(DataColumn dc)
{
string columnType = "TEXT";
switch (dc.DataType.ToString())
{
case "System.Int64" :
case "System.Double":
case "System.Int32" :
columnType = "NUMERIC";
break;
default:
columnType = "TEXT";
break;
}
return columnType;
}
#endregion
#region Public Methods
#region WriteToExcel(DataSet ds)
public static void WriteToExcel(DataSet ds)
{
WriteToExcel(ds,ds.DataSetName + ".xls",false);
}
#endregion
#region WriteToExcel(DataSet ds, String fileName, bool append)
public static void WriteToExcel(DataSet ds, String fileName, bool append)
{
string excelConnectionString = string.Format(_excelConnectionString, fileName);
OleDbConnection excelFile = null;
OleDbCommand excelCmd = null;
OleDbDataAdapter excelDataAdapter = null;
OleDbCommandBuilder excelCommandBuilder = null;
StringBuilder sb = null;
try
{
GC.Collect();
if (File.Exists(fileName) && !append) File.Delete(fileName);
excelFile = new OleDbConnection(excelConnectionString);
excelFile.Open();
foreach (DataTable dt in ds.Tables)
{
if (!File.Exists(fileName) || !append)
{
sb = new StringBuilder();
sb.AppendFormat(_tableCreate, dt.TableName);
foreach (DataColumn dc in ds.Tables[dt.TableName].Columns)
{
sb.AppendFormat(_tableColumn, dc.ColumnName,
getColumnType(dc)
, (dc.Ordinal == dt.Columns.Count - 1 ?
")" : ","));
}
excelCmd = new OleDbCommand(sb.ToString(), excelFile);
excelCmd.ExecuteNonQuery();
}
excelDataAdapter =
new OleDbDataAdapter(string.Format(_excelSelect,dt.TableName), excelFile);
excelCommandBuilder = new OleDbCommandBuilder(excelDataAdapter);
excelCommandBuilder.QuotePrefix = "[";
excelCommandBuilder.QuoteSuffix = "]";
try
{
excelDataAdapter.InsertCommand =
excelCommandBuilder.GetInsertCommand();
excelDataAdapter.Update(ds, dt.TableName);
}catch {}
}
}
finally
{
sb = null;
if(excelDataAdapter != null) excelDataAdapter.Dispose();
excelDataAdapter = null;
excelCommandBuilder = null;
if(excelCmd != null) excelCmd.Dispose();
excelCmd = null;
if (excelFile != null)
{
excelFile.Close();
excelFile.Dispose();
}
excelFile = null;
}
}
#endregion
#region ReadFromExcel(string fileName)
public static DataSet ReadFromExcel(string fileName)
{
return ReadFromExcel(fileName, new DataSet());
}
#endregion
#region ReadFromExcel(string fileName, DataSet ds)
public static DataSet ReadFromExcel(string fileName, DataSet ds)
{
string excelConnectionString = string.Format(_excelConnectionString, fileName);
OleDbConnection excelFile = null;
DataTable schemaTable;
OleDbDataAdapter excelDataAdapter = null;
try
{
excelFile = new OleDbConnection(excelConnectionString);
excelFile.Open();
schemaTable = excelFile.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
new object[] { null, null, null, "TABLE" });
foreach (DataRow dr in schemaTable.Rows)
{
excelDataAdapter = new OleDbDataAdapter(dr[_tableName].ToString(), excelFile);
excelDataAdapter.SelectCommand.CommandType = CommandType.TableDirect;
excelDataAdapter.AcceptChangesDuringFill = false;
string table = dr[_tableName].ToString().Replace("$",
string.Empty).Replace("'", string.Empty);
if (dr[_tableName].ToString().Contains("$"))
excelDataAdapter.Fill(ds, table);
}
excelFile.Close();
}
finally
{
if(excelDataAdapter != null) excelDataAdapter.Dispose();
excelDataAdapter = null;
schemaTable = null;
if (excelFile != null)
{
excelFile.Close();
excelFile.Dispose();
}
excelFile = null;
}
return ds;
}
#endregion
#endregion
Points of interest
This project uses the XmlBrowser
control that I developed. It can be found in another CodeProject article, here. I also originally wrote the Excel functions on my blog a couple of years ago. I fixed the code a bit to make it better. It still has the occasional problem when overwriting the same Excel file multiple times (about 10 or more times). You can find the original version of my code here.
Using the application
When you select an XML schema from the dropdown (and click on Render and then the Save buttons), it will create a file called MetaData.xls. This contains all the information that you would need to determine what columns are required by the DataSet
for shaping to work. Clicking on the Save Data button will save the output DataSet
to a file called OutputData.xls. You have to click on the Render button to see the transformation result.
History
- 19-Mar-2008
Initial release.
- 20-Mar-2008
Separated out the writing of metadata to the DataGrid
/Excel file. This will be done when the user clicks on the Get button. Added a dropdown for XML files. If the user selects an XML file, this is used as the data source instead of reading from Excel.
- 26-Mar-2008
Added a tab control with three tabs: Input XML, Target Schema, and Output XML, so that the user can see what the transformation is doing. Added a Render button so that the user can click this after selecting Input XML and XML Schema from the dropdowns. Added the isDataSet
parameter to the setupDataSet
method. Added the Statistics table to capture DataTable
names and DataTable
row count.
- 27-Mar-2008
Added Save Data button to save the resulting DataSet
to OutputData.xls. Created a project folder called Excel Files so the user can add more than one Excel file for rendering. Made minor fixes to Excel read/write functions to handle special characters.
- 04-Apr-2008
Modified the XmlBrowser
control, and included in a new version (2.0) of this library which requires SAXON. This is an alternate parser with really cool functionality.
- 07-Apr-2008
Added the Vanilla XML option which will render large XML documents really fast using the Microsoft parser. Doesn't render namespace or CDATA
nodes.
- 14-Apr-2008
Modified XmlRender
to render namespace and CDATA
nodes really fast using the Microsoft parser. Allows to save an empty data.xls when only selecting XML schema (this allows the user to build an Excel data source for use with an XML schema to test transformations).
References