Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / Office

DataSets, XML Schemas, and Excel - Oh My!

4.66/5 (17 votes)
24 Apr 2008CPOL5 min read 1   3.5K  
This article demonstrates the amazing transformation power of the DataSet with agnostic XML schemas, XML, and Excel Read/Write (all with colourful XML rendering using XmlBrowser).

Introduction

This article explores the power of the DataSet in a way that you might not be familiar with. Two important (and very helpful) aspects of the DataSet are uncovered here:

  • A DataSet can be used to write to Excel and read from Excel using an OleDbConnection, very easily. (Note: I use these functions in my data-driven unit tests.)
  • A DataSet (or XML document), in conjunction with an agnostic XML schema (not a DataSet XML schema) can be used to shape/reshape XML data.

DataSetMagic_30.JPG

Background

I remember the DAO (Data Access Objects) days and the ADO Recordset days. I also remember being skeptical when Microsoft introduced yet another library to manipulate data. That was when .NET initially came out. My skepticism waned away very quickly when I started to delve under the covers of the DataSet. Microsoft introduced a very powerful tool with the DataSet (even if it isn't sold the way I'm going to sell it to you).

Going back a few years ago, I had designed/developed an Enterprise Management Reporting system for a global company that required MSMQ driven server-side automation of Office. That was an unadvisable task, according to Microsoft - but it was achieved nonetheless (with macros and the like).

Then, I discovered the beauty of writing to and reading from Excel using an OleDbConnection. This meant I could limit the execution time to just the time required to running macros for the manipulation of data (I didn't need Excel to write to Excel anymore). That greatly improved the efficiency, and reduced the risk of problems because the Excel instance was used for a much shorter time.

I also remember a time when I had to use an XmlWriter or the FOR XML clause in SQL Server to build XML data. Using the Relations collection of ADO.NET to build formatted XML? - also a thing of the past. But, those were my younger days, I rejoiced when I found that I could read XML schemas into a DataSet and out came the formatted XML!

And, if that wasn't enough to pique your interest, how about the ability (in the attached code) to take an XML document in one format and have it loaded into a DataSet with XML schema in another format? Who would have thought? A DataSet as a Transformation Engine - you're kidding right? (I'm not!)

Now, you might say that .NET already provides the ability to shape XML with Strongly-Typed DataSets. But, the problem is that you don't necessarily want .NET DataSet XML schemas. I have been on a few projects where the tool used to develop XML schema was not VS2005. Admit it, most times, you need to use agnostic XML schemas. The development team isn't necessarily responsible for XML schemas. They don't own the XML schemas. After all, best practice requires us to separate the presentation of the data from the retrieval of data. So, how can we implement a best practice here?

By relaxing the constraints, we can use the XML schema to build the data, then use the XML schema to validate the data. Kill two birds with one stone - so to speak. Enough of the sales pitch - let's take a look at what I am talking about.

Using the code

There are two main modules in this project.

NameDescription
DataSetMethodsThis class has three methods:
  • gatherMetaData - used to gather table/column information and relations within the DataSet.
  • setupDataSet - takes an agnostic XML schema and makes it DataSet-ready.
  • LoadXmlData - used to load XML into a DataSet with a target XML schema already applied to it.
ExcelFunctionsThis class contains two overloads to ReadFromExcel to a DataSet and two overloads to WriteToExcel from a DataSet.

DataSetMethods code:

C#
/// <summary />
/// This method gathers Table\Column information from a DataSet.
/// It also gathers Relation information from a DataSet.
/// </summary />

public static DataSet gatherMetaData(DataSet ds)
{
    XmlDocument xd = new XmlDocument();
    xd.Load(@"Metadata Xsd\MetaData.xsd");
    DataSet metaData = setupDataSet(xd, new DataSet(),false);

    for (int i = 0; i < ds.Tables.Count; i++)
    {
        for (int j = 0; j < ds.Tables[i].Columns.Count; j++)
        {
            DataRow dr = metaData.Tables["Table"].NewRow();
            dr["Name"] = ds.Tables[i].TableName;
            dr["Column"] = ds.Tables[i].Columns[j].ColumnName;
            metaData.Tables["Table"].Rows.Add(dr);
        }
    }

    for (int i = 0; i < ds.Relations.Count; i++)
    {
        for (int j = 0; j < ds.Relations[i].ParentColumns.Length; j++)
        {
            for (int k = 0; k < ds.Relations[i].ChildColumns.Length; k++)
            {
                DataRow dr = metaData.Tables["Relations"].NewRow();
                dr["ParentTable"] = ds.Relations[i].ParentTable;
                dr["ParentColumn"] = ds.Relations[i].ParentColumns[j].ColumnName;
                dr["ChildTable"] = ds.Relations[i].ChildTable;
                dr["ChildColumn"] = ds.Relations[i].ChildColumns[j].ColumnName;
                metaData.Tables["Relations"].Rows.Add(dr);
            }
        }
    }
    return metaData;
}

/// <summary />
/// Set up a dataset with an Xml Schema that is DataSet friendly
/// </summary />
public static DataSet setupDataSet(XmlDocument xmlSchema, DataSet ds, bool isDataSet)
{
    XmlAttribute msdataAttribute = null;
    XmlNode schemaNode = null;
    StringReader sr = null;
    XmlReader xr = null;
    try
    {
        if (isDataSet)
        {
            msdataAttribute = xmlSchema.CreateAttribute("xmlns:msdata",
                "http://www.w3.org/2000/xmlns/");

            msdataAttribute.Value = "urn:schemas-microsoft-com:xml-msdata";
            schemaNode = xmlSchema.SelectSingleNode(
                "descendant::*[local-name() = 'schema']");

            schemaNode.Attributes.Append(msdataAttribute);
            
            msdataAttribute = xmlSchema.CreateAttribute("msdata:IsDataSet",
                "urn:schemas-microsoft-com:xml-msdata");
            msdataAttribute.Value = "true";

            schemaNode.SelectSingleNode("*[local-name() = 'element']"
                ).Attributes.Append(msdataAttribute);
        }
        sr = new StringReader(xmlSchema.OuterXml);
        xr = XmlReader.Create(sr);
        ds.ReadXmlSchema(xr);
        ds.EnforceConstraints = false;
    }
    finally
    {
        msdataAttribute = null;
        schemaNode = null;
        sr = null;
        xr = null;
    }
    return ds;
}

/// <summary />
/// Load a Dataset (with XmlSchema applied) with data from an XmlDocument
/// </summary />
public static DataSet LoadXmlData(XmlDocument xmlData, DataSet ds)
{
    DataSet tempds = null;
    StringReader sr = null;
    XmlReader xmlDataReader = null;
    try
    {
        tempds = new DataSet();
        sr = new StringReader(xmlData.OuterXml);
        xmlDataReader = XmlReader.Create(sr);
        tempds.ReadXml(xmlDataReader);
        ds.Merge(tempds, false, MissingSchemaAction.Ignore);
    }
    finally
    {
        tempds = null;
        sr = null;
        xmlDataReader = null;
    }
    return ds;
}

ExcelFunctions code:

C#
#region Constants

/// <summary />
/// string to use for setting up connection string to Excel
/// </summary />
private const string _excelConnectionString = 
    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";User ID=" + 
    "Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";

/// <summary />
/// select statement to read from Excel
/// </summary />
private const string _excelSelect = "select * from [{0}]";

/// <summary />
/// tablename column for DataRow
/// </summary />
private const string _tableName = "TABLE_NAME";

/// <summary />
/// CREATE TABLE Template
/// </summary />
private const string _tableCreate = "CREATE TABLE [{0}] (";

/// <summary />
/// COLUMN Template for CREATE TABLE
/// </summary />
private const string _tableColumn = "[{0}] {1}{2}";
#endregion

#region Private Methods
/// <summary />
/// Very simple function to specify Excel DataType mapping.
/// </summary />
private static string getColumnType(DataColumn dc)
{
    string columnType = "TEXT";
    switch (dc.DataType.ToString())
    {
        case "System.Int64" :
        case "System.Double":
        case "System.Int32" :
            columnType = "NUMERIC";
            break;
        default:
            columnType = "TEXT";
            break;
    }
    return columnType;
}

#endregion

#region Public Methods

#region WriteToExcel(DataSet ds)

/// <summary />
/// Write data from a dataset to a new filename.
/// </summary />
public static void WriteToExcel(DataSet ds)
{
    WriteToExcel(ds,ds.DataSetName + ".xls",false);
}
#endregion

#region WriteToExcel(DataSet ds, String fileName, bool append)
/// <summary />
/// Write data from a dataset to a filename.
/// This method can either create a new file or append to
/// an existing Excel file. If append is specified and file does
/// not exist, the file will be created.
/// </summary />

public static void WriteToExcel(DataSet ds, String fileName, bool append)
{
    string excelConnectionString = string.Format(_excelConnectionString, fileName);
    OleDbConnection excelFile = null;
    OleDbCommand excelCmd = null;
    OleDbDataAdapter excelDataAdapter = null;
    OleDbCommandBuilder excelCommandBuilder = null;
    StringBuilder sb = null;
    try
    {
        GC.Collect();
        if (File.Exists(fileName) && !append) File.Delete(fileName);
        excelFile = new OleDbConnection(excelConnectionString);
        excelFile.Open();
        // write each DataTable to Excel Spreadsheet
        foreach (DataTable dt in ds.Tables)
        {
            // file does not exist or we don't want to append
            if (!File.Exists(fileName) || !append)
            {
                // build the CREATE TABLE statement
                sb = new StringBuilder();
                sb.AppendFormat(_tableCreate, dt.TableName);
                foreach (DataColumn dc in ds.Tables[dt.TableName].Columns)
                {
                    sb.AppendFormat(_tableColumn, dc.ColumnName, 
                        getColumnType(dc)
                        , (dc.Ordinal == dt.Columns.Count - 1 ?
                        ")" : ","));
                }
                excelCmd = new OleDbCommand(sb.ToString(), excelFile);
                excelCmd.ExecuteNonQuery();
            }
            // use the command builder to generate insert
            // command for DataSet Update to work
            excelDataAdapter = 
              new OleDbDataAdapter(string.Format(_excelSelect,dt.TableName), excelFile);
            excelCommandBuilder = new OleDbCommandBuilder(excelDataAdapter);
            excelCommandBuilder.QuotePrefix = "[";
            excelCommandBuilder.QuoteSuffix = "]";
            try
            {
                excelDataAdapter.InsertCommand = 
                       excelCommandBuilder.GetInsertCommand();
                excelDataAdapter.Update(ds, dt.TableName);
            }catch {}
        }
    }
    finally
    {
        sb = null;
        if(excelDataAdapter != null) excelDataAdapter.Dispose();
        excelDataAdapter = null;
        excelCommandBuilder = null;
        if(excelCmd != null) excelCmd.Dispose();
        excelCmd = null;
        if (excelFile != null)
        {
            excelFile.Close();
            excelFile.Dispose();
        }
        excelFile = null;
    }
}

#endregion

#region ReadFromExcel(string fileName)

/// <summary />
/// Read from an Excel file into a new DataSet
/// </summary />
public static DataSet ReadFromExcel(string fileName)
{
    return ReadFromExcel(fileName, new DataSet());
}
#endregion

#region ReadFromExcel(string fileName, DataSet ds)
/// <summary />
/// Read from an Excel file into an existing DataSet
/// </summary />
public static DataSet ReadFromExcel(string fileName, DataSet ds)
{
    string excelConnectionString = string.Format(_excelConnectionString, fileName);
    OleDbConnection excelFile = null;
    DataTable schemaTable;
    OleDbDataAdapter excelDataAdapter = null;
    try
    {
        excelFile = new OleDbConnection(excelConnectionString);
        excelFile.Open();
        schemaTable = excelFile.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
        new object[] { null, null, null, "TABLE" });
        // Read each DataTable (i.e. Excel Spreadsheet) into the DataSet
        foreach (DataRow dr in schemaTable.Rows)
        {
            excelDataAdapter = new OleDbDataAdapter(dr[_tableName].ToString(), excelFile);
            excelDataAdapter.SelectCommand.CommandType = CommandType.TableDirect;
            excelDataAdapter.AcceptChangesDuringFill = false;
            string table = dr[_tableName].ToString().Replace("$", 
                           string.Empty).Replace("'", string.Empty);
            if (dr[_tableName].ToString().Contains("$"))
                excelDataAdapter.Fill(ds, table);
        }
        excelFile.Close();
    }
    finally
    {
        if(excelDataAdapter != null) excelDataAdapter.Dispose();
        excelDataAdapter = null;
        schemaTable = null;
        if (excelFile != null)
        {
            excelFile.Close();
            excelFile.Dispose();
        }
        excelFile = null;
    }
    return ds;
}
#endregion

#endregion

Points of interest

This project uses the XmlBrowser control that I developed. It can be found in another CodeProject article, here. I also originally wrote the Excel functions on my blog a couple of years ago. I fixed the code a bit to make it better. It still has the occasional problem when overwriting the same Excel file multiple times (about 10 or more times). You can find the original version of my code here.

Using the application

When you select an XML schema from the dropdown (and click on Render and then the Save buttons), it will create a file called MetaData.xls. This contains all the information that you would need to determine what columns are required by the DataSet for shaping to work. Clicking on the Save Data button will save the output DataSet to a file called OutputData.xls. You have to click on the Render button to see the transformation result.

History

  • 19-Mar-2008
  • Initial release.

  • 20-Mar-2008
  • Separated out the writing of metadata to the DataGrid/Excel file. This will be done when the user clicks on the Get button. Added a dropdown for XML files. If the user selects an XML file, this is used as the data source instead of reading from Excel.

  • 26-Mar-2008
  • Added a tab control with three tabs: Input XML, Target Schema, and Output XML, so that the user can see what the transformation is doing. Added a Render button so that the user can click this after selecting Input XML and XML Schema from the dropdowns. Added the isDataSet parameter to the setupDataSet method. Added the Statistics table to capture DataTable names and DataTable row count.

  • 27-Mar-2008
  • Added Save Data button to save the resulting DataSet to OutputData.xls. Created a project folder called Excel Files so the user can add more than one Excel file for rendering. Made minor fixes to Excel read/write functions to handle special characters.

  • 04-Apr-2008
  • Modified the XmlBrowser control, and included in a new version (2.0) of this library which requires SAXON. This is an alternate parser with really cool functionality.

  • 07-Apr-2008
  • Added the Vanilla XML option which will render large XML documents really fast using the Microsoft parser. Doesn't render namespace or CDATA nodes.

  • 14-Apr-2008
  • Modified XmlRender to render namespace and CDATA nodes really fast using the Microsoft parser. Allows to save an empty data.xls when only selecting XML schema (this allows the user to build an Excel data source for use with an XML schema to test transformations).

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)