Introduction
This article outlines some techniques for using the new XML data type provided in SQL 2005, when creating VS2005 applications. The examples here are written in VB.NET but there is a C# version of the source.
Background
If you develop using object-oriented (OO) languages such as VB.NET or C# and store the data on a relational database system such as SQL Server, one issue that often crops up is how to deal with the differences between the flexible nature of OO and the more rigid hierarchical structure of databases.
For example, let's say your application is an online computer hardware shop and you need to store the details for a set of Products. The common fields, such as ProductID, StockCode, ProductName etc., are relatively easy to map onto a database structure.
What happens however, when the field requirements vary a lot depending on the product nature? The fields for Digital Cameras (number of pixels, maximum zoom, type of storage) would be very different from say, Hard Disks (capacity, interface type, speed, form-factor). You can't really add these as columns to the table, as soon the table would be very large with lots of redundant fields.
One traditional approach is to create a child table, such as ProductAttributes and store the attributes in this as distinct rows. Then the problem becomes one of storing different data types e.g., strings, dates, monetary values, integers etc. in a common way. It also requires a SQL JOIN every time we need to examine a single Product, and a lot more code process, sort and display this.
Using XML to store such data in a more 'flexible' way is one way to keep the values closer to the object without having to resort to sub-tables. Instead of the child table, we could just add an XML column ProductAttributes and store some XML with the attributes.
Origins
I developed this technique when I had to create a Job processing system for a database application. The job processing 'engine' would load pending jobs from the database table, and execute them. A 'job
' was essentially a base class in VS2005, with common properties such as JobID
, JobName
, RunAfter
, etc. The problem was that actual instance classes of jobs such as ExportAccountsDataJob
or SendCustomerEmailAlertsJob
all needed to store and retrieve very different types of data.
One of the new features in SQL 2005 was support for XML as a data type. I realised that if I stored the data for each instance in a class that could be serialized to and deserialized from XML, I could use an XML field to store this data within the Job table. This kept the Job table structure very simple and the complexities of each implementation would deal with what data went into the XML field.
If you don't have SQL 2005 you can use a standard database Text field, although you won't have the SQL 2005 capability to query the XML data. This article isn't intended to show all the power and flexibility of this data type (XQuery alone needs its own book), but to show one way in which it can be used.
How It Works
- First, we create an XML field in our table to store our data
- Next, we define a class in .NET that can be serialized to/from XML so we can convert the XML into an object instance and back again
- Then, to make this more accessible we create a property in our database object this will be explained in more detail below
- You can then add whatever else you need like re-usable user controls for that data type
Address Example
The code included with this article demonstrates using XML for storing the address of a customer in a SQL 2005 database field. Traditionally you might have Line1, Line2, Town etc. as database columns in the Customers table. This works for one instance of an address, but isn't very re-usable.
For example, imagine that you might also need to store a DeliveryAddress when placing an order, or have to add an optional InvoiceAddress for a customer. This leads to columns called InvoiceAddressLine1, InvoiceAddressLine2 or DeliveryAddressLine 1 etc. - you know you've been there. Then along come addresses for suppliers, business partners, shareholders, delivery companies etc.
Each time you'd have to define all the same fields in each new database table (with different names) and then build and bind different controls to display and validate them. Not much re-usability there!
Database Table Design
So, instead of defining our customer table with individual address fields (Line1, Line2 etc.) we simply create a single field using the XML data type. I'll call it AddressXML
. Here is the SQL code to create our Customers table:
CREATE TABLE [dbo].[Customers](
[CustomerID] [int] IDENTITY(1,1) NOT NULL,
[CustomerName] [varchar](64) COLLATE Latin1_General_CI_AS NOT NULL,
[AddressXML] [xml] NULL,
CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
(
[CustomerID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
In the example source, you'll find CreateDatabase.SQL in the App_LocalResources directory which sets up the example database for you and creates the tables and logins.
Note that the SQL 2005 XML data type can support the linking of XML Schemas (XSDs) to a specific XML field to provide data-integrity checking and indexing support, but to keep it simple for this example we're just using an untyped XML column.
Code
First, we create an Address
class in code which contains the public properties that we wish to store: Line1, Line2, Line3 etc.. Note that as we're using XML serialization we can store any type of property including other classes, provided they support serialization.
The App_Code directory contains a Customer
class which maps the main table columns to properties in this manner, and I have written a few simple ObjectDataSource CRUD functions for database access.
Note: the ObjectDataSource functions don't use the SqlClient.SqlDataTypes.XML
type when creating parameters, they use Varchar
or Text
. This is because SQL 2005 XML data types store in UTF16 encoding, and the strings that our classes use are UTF8. If you try to use an XML data type you'll get an error when SQL tries to convert it. If you pass varchar
/text
with UTF8, SQL 2005 automatically converts for you.
To make the address values more usable from the customer object, we have a CustomerAddress
property which will load the address by deserializing the contents of the AddressXML
field.
Public ReadOnly Property CustomerAddress() As Address
Get
GetAddress()
Return _address
End Get
End Property
Private _address As Address = Nothing
Private Sub GetAddress()
If _address Is Nothing Then
If Me.AddressXML Is Nothing Then
_address = New Address()
Else
_address = Address.Deserialize(Me.AddressXML)
End If
End If
End Sub
Public Sub SaveAddress()
If _address IsNot Nothing Then
Me.AddressXML = Address.Serialize(_address)
End If
End Sub
This means instead of writing
myCustomer.Line1 = "123 High Street"
we have the more elegant and logical approach of
myCustomer.CustomerAddress.Line1 = "123 High Street"
I did it this way so that repeated references to CustomerAddress
will deseralize the AddressXML
only once (on demand) and then retain this value. The disadvantage to this technique is that you have to explicitly tell the Customer class to save the data back to the AddressXML
if you update the address in this way (see the SaveAddress
method).
The Address class also has an example property AddressAsHTML
which isn't serialized because it's read-only. This is used in the FindCustomers page to display the address as HTML in a label.
As I use the serialization technique with several different classes I created a generic XMLserializer(Of T)
class (XMLSerializer<T>
in C#) which can be re-used for serializing/deserializing any object type. This means that the code for serializing and deserializing the Address class is now fairly simple:
Private Shared _serializer As New XMLserializer(Of Address)
Public Shared Function Serialize(ByVal myobject As Address) As String
Return _serializer.Serialize(myobject)
End Function
Public Shared Function Deserialize(ByVal xml As String) As Address
Return _serializer.Deserialize(xml)
End Function
Creating an XML Serializer in .NET requires a .DLL to be created and compiled at runtime, so we really want to minimise the number of times this is done. Using a Shared/static variable is one way to do this, as it will only be created once for each application instance.
Example Web Pages
To demonstrate this the application has two pages, AllCustomers.aspx and FindCustomers.aspx. The first shows all customers in a GridView
, and allows you to select, insert, update and delete them.
The DetailsView
control that displays the customer is quite simple, as we use the AddressControl.ascx for displaying and editing the AddressXML
value.
AddressControl.ascx
AddressControl.ascx does not know anything about the database, or the field names, it just interacts with the Address class. It has a single bindable property AddressXML
that we bind to the database field. In my example it binds to Customer.AddressXML
but can easily bind to say Order.DeliveryAddressXML
in the same way.
When the control is displayed, the XML is passed in (AddressXML
is set). This deserializes the XML and copies the values to the controls. When the AddressXML
value is requested (e.g. for an insert or update) the control values are read, the Address object created, serialized and returned as a string.
SQL queries on XML data
One argument against using XML in databases prior to SQL 2005 was that XML data in a text field was not easily queried. The SQL query
SELECT * FROM Customers WHERE Town = 'London'
is very simple and efficient in the traditional approach. With XML data in a text field, finding customers whose Town was 'London' would require reading all customers, deserializing the address and checking the value - this would obviously be much less efficient.
SQL 2005 supports the querying of XML data (either typed or untyped), so this would translate to
SELECT * FROM Customers
WHERE AddressXML.value('(/Address/Town)[1]', 'nvarchar(1000)') = 'London'
The FindCustomers page shows an example of searching in this way. You can enter a Town and search for customers who have that value. The data source Customers.vb code uses an XQuery search on the XML field.
Note that I asked the Address class to provide the actual XPath Address/Town in case we later change "Town" to something else. This avoids coding address logic into the Customer
class.
The Benefits of Using XML
Re-Use
Having most of the address structure and functionality in the Address class, we can now make re-use pay. Need an invoice address in your customer details? Simply add a column InvoiceAddressXML
to the database table, and the supporting code to the Customer class to access it as Customer.InvoiceAddress
.
Control binding etc.
A cool trick is that we can treat an AddressXML
as a single property and pass it to the AddressControl
web user control. If we had gone down the separate-columns approach we would need to create several properties in our address control and pass the values of each one when using a control in a databound form.
This is a major benefit as I've previously had to create some very complicated user controls in this way, and it is really time-consuming binding Line1, Line2, etc..
Another point is that web controls always pass values as strings. Incorrect formatting of strings (especially dates!) can be a real issue when regional settings can cause problems. XML allows the string data to pass through and be re-assembled into safe .NET data types.
Flexibility
In this example we have only used AddressXML
to store a single type of data. However, we can in theory store almost anything in an XML field. To go back to our Products example earlier, we needed to store different attributes for different types of products. Let's say our Products table might have ProductID, ProductName, ProductCategory and ProductAttributesXML.
We would then use ProductCategory to decide what type of class we store in ProductAttributesXML. So for ProductCategory='Digital Cameras" we could define a class DigitalCameraAttributes
with fields such as Pixels
, MaximumZoom
, StorageType
, and a special user control to show these details. Likewise, a class called HardDiskAttributes
for the Hard Disks category.
Conclusion
I hope you find this approach a useful one. It provides a new way to handle the complexity and flexibility of object-oriented code when used with a relational database such as SQL 2005.