Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

XML data type tips in SQL Server 2005

4.75/5 (29 votes)
9 Nov 2006CPOL10 min read 1   3K  
Techniques for using the XML data type for storing data.

Introduction

This article outlines some techniques for using the new XML data type provided in SQL 2005, when creating VS2005 applications. The examples here are written in VB.NET but there is a C# version of the source.

Background

If you develop using object-oriented (OO) languages such as VB.NET or C# and store the data on a relational database system such as SQL Server, one issue that often crops up is how to deal with the differences between the flexible nature of OO and the more rigid hierarchical structure of databases.

For example, let's say your application is an online computer hardware shop and you need to store the details for a set of Products. The common fields, such as ProductID, StockCode, ProductName etc., are relatively easy to map onto a database structure.

What happens however, when the field requirements vary a lot depending on the product nature? The fields for Digital Cameras (number of pixels, maximum zoom, type of storage) would be very different from say, Hard Disks (capacity, interface type, speed, form-factor). You can't really add these as columns to the table, as soon the table would be very large with lots of redundant fields.

One traditional approach is to create a child table, such as ProductAttributes and store the attributes in this as distinct rows. Then the problem becomes one of storing different data types e.g., strings, dates, monetary values, integers etc. in a common way. It also requires a SQL JOIN every time we need to examine a single Product, and a lot more code process, sort and display this.

Using XML to store such data in a more 'flexible' way is one way to keep the values closer to the object without having to resort to sub-tables. Instead of the child table, we could just add an XML column ProductAttributes and store some XML with the attributes.

Origins

I developed this technique when I had to create a Job processing system for a database application. The job processing 'engine' would load pending jobs from the database table, and execute them. A 'job' was essentially a base class in VS2005, with common properties such as JobID, JobName, RunAfter, etc. The problem was that actual instance classes of jobs such as ExportAccountsDataJob or SendCustomerEmailAlertsJob all needed to store and retrieve very different types of data.

One of the new features in SQL 2005 was support for XML as a data type. I realised that if I stored the data for each instance in a class that could be serialized to and deserialized from XML, I could use an XML field to store this data within the Job table. This kept the Job table structure very simple and the complexities of each implementation would deal with what data went into the XML field.

If you don't have SQL 2005 you can use a standard database Text field, although you won't have the SQL 2005 capability to query the XML data. This article isn't intended to show all the power and flexibility of this data type (XQuery alone needs its own book), but to show one way in which it can be used.

How It Works

  • First, we create an XML field in our table to store our data
  • Next, we define a class in .NET that can be serialized to/from XML so we can convert the XML into an object instance and back again
  • Then, to make this more accessible we create a property in our database object this will be explained in more detail below
  • You can then add whatever else you need like re-usable user controls for that data type

Address Example

The code included with this article demonstrates using XML for storing the address of a customer in a SQL 2005 database field. Traditionally you might have Line1, Line2, Town etc. as database columns in the Customers table. This works for one instance of an address, but isn't very re-usable.

For example, imagine that you might also need to store a DeliveryAddress when placing an order, or have to add an optional InvoiceAddress for a customer. This leads to columns called InvoiceAddressLine1, InvoiceAddressLine2 or DeliveryAddressLine 1 etc. - you know you've been there. Then along come addresses for suppliers, business partners, shareholders, delivery companies etc.

Each time you'd have to define all the same fields in each new database table (with different names) and then build and bind different controls to display and validate them. Not much re-usability there!

Database Table Design

So, instead of defining our customer table with individual address fields (Line1, Line2 etc.) we simply create a single field using the XML data type. I'll call it AddressXML. Here is the SQL code to create our Customers table:

SQL
/* SQL to create simple customers table (with XML datatype field) */
CREATE TABLE [dbo].[Customers](
    [CustomerID] [int] IDENTITY(1,1) NOT NULL,
    [CustomerName] [varchar](64) COLLATE Latin1_General_CI_AS NOT NULL,
    [AddressXML] [xml] NULL,
 CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
(
    [CustomerID] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

In the example source, you'll find CreateDatabase.SQL in the App_LocalResources directory which sets up the example database for you and creates the tables and logins.

Note that the SQL 2005 XML data type can support the linking of XML Schemas (XSDs) to a specific XML field to provide data-integrity checking and indexing support, but to keep it simple for this example we're just using an untyped XML column.

Code

First, we create an Address class in code which contains the public properties that we wish to store: Line1, Line2, Line3 etc.. Note that as we're using XML serialization we can store any type of property including other classes, provided they support serialization.

The App_Code directory contains a Customer class which maps the main table columns to properties in this manner, and I have written a few simple ObjectDataSource CRUD functions for database access.

Note: the ObjectDataSource functions don't use the SqlClient.SqlDataTypes.XML type when creating parameters, they use Varchar or Text. This is because SQL 2005 XML data types store in UTF16 encoding, and the strings that our classes use are UTF8. If you try to use an XML data type you'll get an error when SQL tries to convert it. If you pass varchar/text with UTF8, SQL 2005 automatically converts for you.

To make the address values more usable from the customer object, we have a CustomerAddress property which will load the address by deserializing the contents of the AddressXML field.

VB
Public ReadOnly Property CustomerAddress() As Address
    Get
        GetAddress()
        Return _address
    End Get
End Property
Private _address As Address = Nothing

'ensure _address has a valid value
Private Sub GetAddress()
    If _address Is Nothing Then
        'check for blank
        If Me.AddressXML Is Nothing Then
            'create new instance
            _address = New Address()
        Else
            'deserialize existing XML field
            _address = Address.Deserialize(Me.AddressXML)
        End If
    End If
End Sub

Public Sub SaveAddress()
    If _address IsNot Nothing Then
        'write to field
        Me.AddressXML = Address.Serialize(_address)
    End If
End Sub

This means instead of writing

VB
myCustomer.Line1 = "123 High Street"

we have the more elegant and logical approach of

VB
myCustomer.CustomerAddress.Line1 = "123 High Street"

I did it this way so that repeated references to CustomerAddress will deseralize the AddressXML only once (on demand) and then retain this value. The disadvantage to this technique is that you have to explicitly tell the Customer class to save the data back to the AddressXML if you update the address in this way (see the SaveAddress method).

The Address class also has an example property AddressAsHTML which isn't serialized because it's read-only. This is used in the FindCustomers page to display the address as HTML in a label.

As I use the serialization technique with several different classes I created a generic XMLserializer(Of T) class (XMLSerializer<T> in C#) which can be re-used for serializing/deserializing any object type. This means that the code for serializing and deserializing the Address class is now fairly simple:

VB
Private Shared _serializer As New XMLserializer(Of Address)

Public Shared Function Serialize(ByVal myobject As Address) As String
    Return _serializer.Serialize(myobject)
End Function

Public Shared Function Deserialize(ByVal xml As String) As Address
    Return _serializer.Deserialize(xml)
End Function

Creating an XML Serializer in .NET requires a .DLL to be created and compiled at runtime, so we really want to minimise the number of times this is done. Using a Shared/static variable is one way to do this, as it will only be created once for each application instance.

Example Web Pages

To demonstrate this the application has two pages, AllCustomers.aspx and FindCustomers.aspx. The first shows all customers in a GridView, and allows you to select, insert, update and delete them.

Image 1

The DetailsView control that displays the customer is quite simple, as we use the AddressControl.ascx for displaying and editing the AddressXML value.

AddressControl.ascx

AddressControl.ascx does not know anything about the database, or the field names, it just interacts with the Address class. It has a single bindable property AddressXML that we bind to the database field. In my example it binds to Customer.AddressXML but can easily bind to say Order.DeliveryAddressXML in the same way.

When the control is displayed, the XML is passed in (AddressXML is set). This deserializes the XML and copies the values to the controls. When the AddressXML value is requested (e.g. for an insert or update) the control values are read, the Address object created, serialized and returned as a string.

SQL queries on XML data

One argument against using XML in databases prior to SQL 2005 was that XML data in a text field was not easily queried. The SQL query

SQL
SELECT * FROM Customers WHERE Town = 'London'
is very simple and efficient in the traditional approach. With XML data in a text field, finding customers whose Town was 'London' would require reading all customers, deserializing the address and checking the value - this would obviously be much less efficient.

SQL 2005 supports the querying of XML data (either typed or untyped), so this would translate to

SQL
SELECT * FROM Customers 
    WHERE AddressXML.value('(/Address/Town)[1]', 'nvarchar(1000)') = 'London'

The FindCustomers page shows an example of searching in this way. You can enter a Town and search for customers who have that value. The data source Customers.vb code uses an XQuery search on the XML field.

Image 2

Note that I asked the Address class to provide the actual XPath Address/Town in case we later change "Town" to something else. This avoids coding address logic into the Customer class.

The Benefits of Using XML

Re-Use

Having most of the address structure and functionality in the Address class, we can now make re-use pay. Need an invoice address in your customer details? Simply add a column InvoiceAddressXML to the database table, and the supporting code to the Customer class to access it as Customer.InvoiceAddress.

Control binding etc.

A cool trick is that we can treat an AddressXML as a single property and pass it to the AddressControl web user control. If we had gone down the separate-columns approach we would need to create several properties in our address control and pass the values of each one when using a control in a databound form.

This is a major benefit as I've previously had to create some very complicated user controls in this way, and it is really time-consuming binding Line1, Line2, etc..

Another point is that web controls always pass values as strings. Incorrect formatting of strings (especially dates!) can be a real issue when regional settings can cause problems. XML allows the string data to pass through and be re-assembled into safe .NET data types.

Flexibility

In this example we have only used AddressXML to store a single type of data. However, we can in theory store almost anything in an XML field. To go back to our Products example earlier, we needed to store different attributes for different types of products. Let's say our Products table might have ProductID, ProductName, ProductCategory and ProductAttributesXML.

We would then use ProductCategory to decide what type of class we store in ProductAttributesXML. So for ProductCategory='Digital Cameras" we could define a class DigitalCameraAttributes with fields such as Pixels, MaximumZoom, StorageType, and a special user control to show these details. Likewise, a class called HardDiskAttributes for the Hard Disks category.

Conclusion

I hope you find this approach a useful one. It provides a new way to handle the complexity and flexibility of object-oriented code when used with a relational database such as SQL 2005.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)