|
Title |
Learn XML In A Weekend |
Author |
Erik Westermann |
Publisher |
Premier Press Books |
Published |
October 2002 |
ISBN |
159200010X |
Price |
USD 29.99 |
Pages |
400 |
|
Introduction
This article is an excerpt from Learn XML In a Weekend, specifically, Sunday Evening's lesson (chapter seven) called Programming with XML. I selected this chapter as the book's sample chapter becasue I feel that it represents the flow of the book's overall discussion and demonstrates the focus of each lesson.
This book contains seven lessons and other resources and information that are focused on only one thing: getting you up to speed with XML, its related technologies, and latest developments. The lessons are organized into a timeframe that spans a weekend beginning on Friday evening and ending on Sunday evening - and, yes, you can learn XML in a weekend.
I know what you're saying to yourself, with all of the other XML books that line the shelves of book stores and on-line stores, "What's so special about this book?". This book is different because it not only explains what XML is and how to use it, but also presents relevant, practical, and real-world uses of XML.
This book focuses on relevant XML technologies like XPath, XSD, DTD, and CSS, and explains why other technologies, like XDR, may not be important in certain scenarios. This book also takes a practical approach to working with XML. After learning the core syntax and other rules, I show you how to work with XML using two of the best XML editors on the market today: Excelon's Stylus Studio and Altova's XML Spy because there's not much point in writing XML documents, schemas, and transformations by hand if XML editors are capable of not only assisting you but also capable of generating a lot of XML for you.
I also discuss how to use XML in Internet Explorer, Microsoft Active Server Pages, and how to use XML with Microsoft's latest offering: the .NET Framework and the Visual C# .NET programming language.
This book is for you because it succinctly describes XML and its related technologies, focusing only on what's relevant in today's rapidly changing market place. I help you make choices that can mean the difference between a successful solution and one that fails because it uses irrelevant, incompatible, or out-dated standards.
Here's an overview of each lesson:
- Friday Evening: A sort lesson focusing on introducing XML in terms of what it is, why it is useful, and how others use XML.
- Saturday Morning: Focuses on using XML in Internet Explorer using HTML and XSL, and using XML with Microsoft's Active Server Pages. The intent of this lesson is to give you an overview of what you can do with XML - don't worry if you're not a programmer or don't understand the programming language that's used in the lesson. The idea is to expose you to these technologies so that you'll gain a better understanding of how others use XML.
- Saturday Afternoon: This lesson focuses XML's core: how to write XML documents by following the rules that XML imposes. The lesson covers basic document structure, working with attributes, comments, and CDATA sections. The lesson also covers character encoding, which allows international users to read your XML documents, and namespaces - a feature that makes your XML documents more useful by allowing you to share them with others.
- Saturday Evening: This is one of the longest lessons in the book focusing on document modeling using DTD and XSD. I suggest that you start reading this chapter as soon as you can after you complete Saturday afternoon's lesson so that you can complete the lesson in one evening.
- Sunday Morning: The focus of this lesson is on using XML Spy and Stylus Studio to create and work with XML solutions. The lesson also covers XSL debugging using Stylus Studio - something that can save you hours of frustration when your XSL code does not work as you expect it should. This lesson also describes Microsoft's XML parser, called Microsoft XML Core Services, how to determine what version is installed on your system, and how to get the latest updates.
- Sunday Afternoon: This lesson focuses on presenting data on the Web using presentation technologies like CSS and XSL. This lesson examines how to repurpose an XML document using XSL that you create using Stylus Studio's graphical XSL editor.
- Sunday Evening: The focus of this lesson is to show you how to use XML with Internet Explorer's Data Source Object (DSO), the XML Document Object Model (XML DOM), and Microsoft's .NET Framework.
- Appendix A: Presents an HTML and XPath reference.
- Appendix B: Presents reference information on XML syntax and structure.
- Glossary: Defines terms used throughout the book.
Sunday Evening - Programming with XML
You've come a long way in a very short time. Congratulations!
With the essential aspects of XML behind you, you're now ready to
get into working with XML. This lesson is made up of three major
parts:
- XML data binding with Internet Explorer.
- Working with the XML DOM in Internet Explorer.
- Using the Microsoft .NET Framework to work with XML.
The first part requires next to no programming experience at
all, but the last two parts get into a fair amount of
programming. You don't necessarily have to be a professional
programmer to be able to understand the material in this lesson.
A lot of people know just enough programming to write
applications for themselves, while others are interested in
programming to a degree but haven't had the opportunity to
explore it on their own. If the latter describes your situation,
this section will be of interest to you. Even if you're not at
all experienced in programming, you should at least skim through
this section to see XML from another perspective.
The samples in the last three major sections use the following
programming languages: C# (pronounced "see-sharp"), JavaScript,
and VBScript. You don't necessarily need to know any of those
programming languages, but some programming experience will help
you understand the samples more easily. The discussion of the
.NET Framework explains what it is and how it uses XML.
XML Data Binding with Internet Explorer
Data binding is a general term for connecting a data source
with a control that's capable of displaying data. Data
source describes anything that acts as a source of data, such
as a database or an XML document. Control is another
general term for a visual element that's used to lay out
information on the screen, like a table.
Internet Explorer can bind an HTML table to an XML data
source, without programming, so you can easily display formatted
XML data within a table on an HTML page. This is made possible by
something called a Data Source Object (DSO). The DSO
doesn't require any programming and is very easy to work with. It
often satisfies most people's requirements. The only drawback to
using the DSO is that it's available only with Internet Explorer
version 4.0 and later.
You're limited to working with an HTML table to display data
that you acquire through the DSO, and the DSO is capable of
working with relatively simple data structures. The first
example, shown in Figure 7.1, demonstrates how to use the DSO to
generate a table that contains a listing of books, authors, and
each book's Library of Congress (LOC) classification.
Figure 7.1- Displaying XML data in an HTML table.
The Library of Congress (LOC) Classification System's class
and subclass information is shown in the last two columns of the
table. For more information on the Library of Congress
Classification System, visit their Web site at http://www.loc.gov/.
The listing is based on information in the books.xml document,
located in the \XMLInAWeekend\chapter07 folder. The page itself
is called dso1.htm. It's made up of HTML code, and also includes
tags that describe where the data in the table comes from and
which cells of the table contain the data. The following listing
presents most of the page, with the relevant sections shown in
bold:
<html>
<head>
<title>Data Source Object Demo #1</title>
</head>
<body>
<XML ID="xmldata" src="books.xml"></XML>
<table datasrc="#xmldata" id="ListOfBooks"
width="80%" align="center" cellpadding="0"
cellspacing="0" border="1">
<tr>
<thead>
<tr>
<th>Title</th>
<th>Author</th>
<th>Class</th>
<th>Subclass</th>
</thead>
<td valign="top"><div datafld="title"></div></td>
<td valign="top"><div datafld="author"></div></td>
<td valign="top"><div datafld="loc_class"></div></td>
<td valign="top"><div datafld="loc_subclass"></div></td>
</tr>
</table>
<br>
<hr>
</body>
</html>
Appendix A contains an HTML reference that describes all of
the elements that make up a table and demonstrates how create a
table, so I won't repeat how to do that here.
The first bold part of the listing is the xml
element. This xml
element is very different from the
xml
element you've seen in XML document
declarations. This one delimits what's referred to as an XML
data island, which allows you to have an XML document
either reside directly within an HTML document or refer to an XML
document. The ID
attribute is required because it's
used to assign a name of the data island. Later you'll use the
name of the data island to bind an HTML table to it. Instead of
having the XML document appear inline with the rest of the HTML,
this example refers to the XML document that resides in a
separate file, as the src
attribute indicates. Note
that the xml
element must have a closing tag.
An HTML table appears immediately following the XML data
island declaration. Most of the table uses common attributes,
except for the datasrc
and ID
attributes. The datasrc
attribute associates the
table with the XML data island by name. Note that the name is
prefixed with the pound (#
) sign. The table's
ID
attribute assigns a name to the table. This is
optional, but it's a good practice to provide a name because you
may wish to take advantage of the additional features that are
demonstrated in the next example.
The names of the XML elements that you want to have in certain
cells appear later in the table's declaration. Each field you
want to have in the table must reside in an HTML div
element, as shown in the preceding listing. The value of the
datafld
attribute must exactly match the name of the
XML element's value you want to have in the cell. You can also
specify the names of attributes in the datafld
attribute. The DSO is capable of matching XML attribute names and
generally can handle rather complex XML documents. However, the
DSO doesn't seem to like attributes in the root element's
declaration.
You can open the page directly in Internet Explorer to see the
final result. Depending on how fast your system is, you may be
able to see the table being formatted as Internet Explorer reads
through the XML document and populates the table with data. The
DSO reads XML data asynchronously, meaning that Internet Explorer
picks up new data from the DSO whenever it's ready, while the DSO
continues to read the data as quickly as it can. This feature
boosts overall performance and makes users feel that Internet
Explorer remains responsive, even when it loads larger data
sets.
While the results are interesting, they can be better. There's
a lot of information on the screen, and scrolling up and down can
become tedious. The DSO provides a means of paging through a
dataset using a few buttons and a tiny bit of coding.
Paging Through Long Data Sets
The DSO provides paging support for longer data sets. This is
the primary benefit for users because they can easily navigate
through a dataset on a page-by-page basis (see Figure 7.2).
Figure 7.2 - Navigational controls for paging through a
long data set.
The sample file is called dso2.htm and is also located in the
\XMLInAWeekend\chapter07 folder. Open it directly in Internet
Explorer and try the navigation buttons.
Another advantage a paged data set provides is responsiveness.
I mentioned earlier that the DSO and Internet Explorer work
asynchronously to acquire and present data. When you first load
dso2.htm, the table appears to be populated as soon as you open
the page in Internet Explorer. As you play with the navigational
controls or just look at the information on the screen, the DSO
and Internet Explorer are hard at work, continuing to populate
the table with information from the XML document. As a result,
Internet Explorer continues to feel responsive even though it's
working to keep the table completely populated.
The dso2.htm file is almost identical to dso1.htm, with a few
minor changes, so I'll just focus on the new parts of the
document. The following listing begins just after the opening
HTML body
tag and ends just after the first HTML
tr
tag of the table:
<body>
<XML ID="xmldata" src="books.xml"></XML>
<center>
<input type="button" value="FIRST Page"
onclick="ListOfBooks.firstPage();">
<input type="button" value="<< Previous"
onclick="ListOfBooks.previousPage();">
<input type="button" value="Next >>"
onclick="ListOfBooks.nextPage();">
<input type="button" value="LAST Page"
onclick="ListOfBooks.lastPage();">
</center>
<hr>
<br>
<table datasrc="#xmldata" datapagesize="5" id="ListOfBooks"
border="1" width="80%" align="center"
cellpadding="0" cellspacing="0" >
<tr>
The listing begins just after the HTML body element with the now
familiar XML element, which is identical to the one you saw
before. Immediately following that declaration is a HTML
center
element, which centers everything that
resides within the beginning and ending tags.
There are four buttons on the page, as described by the HTML
input
elements. All of the input
elements' type
attributes cause them to be rendered
as buttons, while the value
attribute declares the
caption that appears in the button. The value
attribute uses predefined HTML entity references for the
less-than symbol (< - <
) and greater-than
symbol (> - >
) to avoid confusion with an
HTML tag, because there are a lot of HTML tags close by.
The values of the onclick
attributes are the key
factors in providing navigational support to the user. Each
onclick
attribute's name comes from the event that
the attribute captures. Internet Explorer evaluates the value of
the onclick
attribute when the user clicks one of
the buttons. The value of the attribute uses the name of the
table that displays the data, as shown in the table's
ID
attribute (ListOfBooks
). This
followed by a dot, followed by the name page you want to navigate
to. There are four named pages that you can navigate to:
firstPage()
, lastPage()
,
nextPage()
, and previousPage()
. The
spelling, case, and parentheses that follow the name of each page
are important and must appear exactly as shown.
The table element has one new attribute,
datapagesize
, which describes how many rows you want
the table to display on a given page. Specify the number of items
per page using a numeric value, optionally in quotes, as shown in
the preceding listing.
While paging support is helpful, you can go a step further by
allowing users to define how many items they want to show per
page.
Dynamically Changing the Number of Items Per Page
Allowing your users to change how many items appear per page
increases the usability of the solution, because it helps to
accommodate users with smaller or lower-resolution screens that
can't fit as much information. This option also makes the
solution more interactive by allowing users to customize the
display to suit their preference.
This solution builds on the last one by adding two more HTML
input
elements and some simple JavaScript code. The
input elements capture the user's requested number of items per
page, and the JavaScript code dynamically reconfigures the DSO
and resets the display after each change. The overall effect is
shown in Figure 7.3.
Figure 7.3 - Allowing users to change the number of
items per page.
To get a feel for how this feature works, use Internet
Explorer to open the dso3.htm file, located in the
\XMLInAWeekend\chapter07 folder. Navigate to another page in the
data set and then change the number of pages. The number of items
changes and the page resets to the first page of the data set.
Changing the number of items per page can make it difficult to
figure out where you are in the data set, but resetting the
display to the first page will minimize that effect.
The first change to the page occurs with the introduction of
two more HTML input
elements, as shown by the bold
lines in the following listing:
<center>
<input type="button" value="FIRST Page"
onclick="ListOfBooks.firstPage();">
<input type="button" value="<< Previous"
onclick="ListOfBooks.previousPage();">
<input type="button" value="Next >>"
onclick="ListOfBooks.nextPage();">
<input type="button" value="LAST Page"
onclick="ListOfBooks.lastPage();">
<input type="text" maxlength="2" size="2" id="itemsPerPage">
<input type="button" value="change" onclick="changePageSize();">
</center>
The first new input
element is a text input field
that captures the user's preferred number of items per page. The
critical part of the declaration is the ID attribute that assigns
a name to the field (itemsPerPage
). The second
input
element is a button that the user clicks to
carry out the requested change. Like other buttons on the page,
this button has an onclick
attribute. This time, the
attribute refers to the name of a JavaScript function that
applies the change. You can use any name you like when you create
your own pages, but be sure to use a descriptive name that
suggests what the JavaScript function does.
The page is not yet complete, because the JavaScript code that
handles the new button's click event is not part of the page.
JavaScript code typically resides within a script
element, which in turn resides between the HTML page's
head
starting and ending tags (which appear just
after the html
element, but before the
body
element), as shown in the following
listing:
<html>
<head>
<title>Data Source Object Demo #3</title>
<script language="JavaScript">
function Initialize()
{
itemsPerPage.value = ListOfBooks.dataPageSize;
}
function changePageSize()
{
ListOfBooks.dataPageSize = itemsPerPage.value;
ListOfBooks.firstPage();
}
</script>
</head>
<body onload="Initialize();">
<XML ID="xmldata" src="loc_classes.xml"></XML>
-->
There's a lot going on in this listing, so let's start in
somewhat familiar territory with the body tag, near the end of
the listing. The body
tag has an attribute called
onload
, which you saw in Saturday morning's lesson.
Internet Explorer evaluates the value of the onload
attribute when it has finished loading the document but hasn't
displayed it yet. As a result, the onload
attribute
usually refers to the name of a function that carries out
initialization tasks, which get the page ready for display.
JavaScript is a popular programming language that most Web
developers are at least familiar with. It's so popular because
it's easy to use and is supported by major browsers, including
Internet Explorer. JavaScript code on an HTML page must appear
within a script
element whose declaration includes
the name of the programming language used by the code within it.
Internet Explorer is adept at figuring out which programming
language appears within a script element, making the value of the
language
attribute optional. However, it's good
practice to specify the name of the programming language you're
using. For Internet Explorer, valid settings for the script
element's language attribute are JavaScript
,
JScript
, and VBScript
.
The body
element's onload
attribute
refers to a function called Initialize()
that
appears immediately after the script
element's
starting tag. You can place this function wherever you want, as
long as it appears within the script
element.
Generally, it's good practice to put a function like this
immediately after the starting script
tag, or just
before the ending script
tag.
The Initialize()
function sets the value that
appears in the itemsPerPage
input field so it's
equal to the number of items per page that the table
(ListOfBooks
) displays, to let the user know how
many items per page the table starts with. The single line of
code in the function refers to the name of the input field,
itemsPerPage
, followed by the field's property that
holds the value you see on the screen (value
),
followed by an equals sign. So far, the statement says, "Assign a
value to the itemsPerPage field." The code that follows the
equals sign describes what to assign to the input field: the
value of the number of pages that the table displays.
The next function, changePageSize()
, handles the
onclick
event for the button that's next to the
input
field. The first line of the function assigns
the value in the itemsPerPage
input field to the
ListOfBooks
dataPageSize
attribute. The
line that follows resets the table's page back to the first page
of the data set.
Keep the following in mind as you design your own pages that
use the DSO:
- Root XML elements cannot have attributes.
- XML documents can be reasonably complex.
- The DSO is capable of working with attributes. Simply refer
to them by name.
- Assign names to all buttons, tables, and text input fields
using the ID attribute.
- When creating a paged data set, assign a value to the
dataPageSize attribute of the table that displays the data set
for best performance.
The next example uses VBScript and the XML DOM to explore the
structure of an XML document and write its contents to the
screen.
Exploring an XML Document Using the XML DOM
The XML DOM (Document Object Model) provides a means of
programmatically creating and manipulating XML documents. The DOM
exposes (presents) an XML document by reading it into the
computer's memory and then representing each aspect of it as a
distinct object. An object is a logical element (meaning
that it doesn't really exist) that represents something like a
part of an XML document, a string, or even a file. An object not
only refers to something, it also offers a means of manipulating
it by exposing functionality that's appropriate for the object.
For example, an object that represents a string of characters
(simply referred to as a string) exposes functionality that
allows programmers to compare, add, and manipulate strings. An
object that represents part of an XML document may expose
functionality that allows you to navigate to another part of the
document, transform it using XSL, or add new information to
it.
While an XML document essentially conveys information about
the structure and content of the document using elements, an XML
DOM represents an XML document as a collection of node
objects. XML documents have several types of elements, so the XML
DOM has several types of nodes, as shown in Table 7.1.
Table 7.1 Types of XML Nodes
Type |
Description |
document |
Represents the document starting at the XML declaration. The
document has only one child node (an element) that represents the
document's root element. |
documentFragment |
Represents a fragment of an XML document. |
element |
Represents an XML element. |
text |
Represents the (text) content of an element and an
attribute. |
attribute |
Represents an element's attribute.. |
cdatasection |
Represents an element's CDATA data. |
processinginstruction |
Represents a processing instruction. |
comment |
Represents a comment. |
entity |
Represents an entity, as described by the document's
DTD. |
entityreference |
Represents an entity reference, as described by the
document's DTD. |
documenttype |
Represents the document's DTD. |
notation |
Represents a node notation in the document's DTD. |
Each node in the XML DOM exposes a property called
nodeType
that describes what type of node it is,
enabling you to process nodes based on their type. The example in
this section processes attributes and elements based on the value
in an XML document's nodeType
property, as you'll
see shortly.
One of the interesting features of the XML DOM is that it
allows you to traverse an entire document without actually
referring to nodes by name, very much like you saw in the
previous lesson using XPath expressions. The example for this
lesson is an HTML page containing VBScript code that explores the
structure of an XML document using the XML DOM, and then writes
out the structure and data onto the screen. Figure 7.4 shows what
the output of the page looks like.
Figure 7.4 - Exploring an XML document using the XML
DOM.
The sample is in the XML-DOM.htm
file located in
the \XMLInAWeekend\chapter07
folder. If you
extracted the contents of the book's accompanying ZIP file, as
described in the Preface, you're set to go. Otherwise, you'll
have to edit the page to change the location of the XML document
the sample relies on. The sample uses the carparts.xml document
from the previous lesson, because you're probably familiar with
its structure and content by now. Open the document to view its
results. There's a chance that Internet Explorer will ask if you
want to continue to let the page run. Answer Yes to allow the
code in the page to continue processing the complete
document.
The page is made up of about 95% VBScript code, with the
remaining 5% representing supporting HTML code, so I won't
describe the HTML code here.
The code is made up of three functions, all written in
VBScript. The Initialize
function acts as the
starting point and gets called when you first load the page. Its
role is to load the XML document into the DOM and start the
processing by calling the walk_elements
function.
The walk_elements
function traverses elements in the
XML DOM by recursively calling itself to display the contents of
each element. When walk_elements
encounters an
element with attributes, it calls the
walk_attributes
function, which displays the
contents of all attributes for an element. The code isn't too
complex, except for the part where walk_elements
calls itself. When a function calls itself, it's said to be
recursive. Recursive programming is an efficient means of
processing many types of logical structures. Rather than present
all 95 lines that make up the code on the page in one chunk, I'll
walk you through the code as it executes when it processes an XML
document.
The very first thing that happens when the page loads is that
the following line executes:
indent = 0
This line creates a numeric, global variable that all code on the
page can access, and sets its initial value to
0
.
This variable controls a key factor in the page's layout: the
indentation of each line. Next, the now-familiar
Initialize
function is called as a result of the
HTML document's
body
element
onload
attribute setting. The following listing presents most of the
Initialize
function:
Function Initialize()
Dim root
Dim xmlDoc
Dim child
Dim indent
indent=0
Set xmlDoc = CreateObject("Msxml.DOMDocument")
xmlDoc.async = False
xmlDoc.validateOnParse=False
xmlDoc.load("\XMLInAWeekend\chapter06\carparts.xml")
If xmlDoc.parseError.errorcode = 0 Then
Document.Write("
")
walk_elements(xmlDoc)
Document.Write("
")
Else
' code omitted for brevity...
End If
End Function
After the initial Dim
statements that declare
some local variables, the function begins by creating an instance
of the XML DOM object using the VBScript
CreateObject
function. The XML DOM exposes some
properties that control its behavior when working with XML
documents. The two lines that follow set the DOM object's
async
and validateOnParse
properties to
Fa
lse, essentially disabling those two options
(loading the document asynchronously and validating the document
as it loads). The If
statement determines if there
were any errors loading the document by evaluating the value of
the DOM object's errorCode
property. If there
weren't any errors, the function calls walk_elements
to begin processing the document.
The walk_elements
function takes a single
parameter: a node. When Initialize
calls
walk_elements
the first time, it passes the function
the instance of the XML DOM, which is essentially a special type
of node that represents the entire XML document. (See Table 7.1
for the types of nodes.) The function begins by initializing a
For
loop that executes once for each child node, as
shown in the following listing:
function walk_elements(node)
dim nodeName
dim count
count = 1
indent=indent+2
For Each child In node.childNodes
For i = 1 To indent
Document.Write(" ")
Next
The next thing it does is indent the line it's about to write
out by writing out several non-breaking space characters using
the
predefined HTML entity reference. The
next block of code writes out the node's type and its name, as
shown in the following listing:
Document.Write("+--")
Document.Write("<u>" & child.nodeTypeString & "</u>: ")
If child.nodeType < 3 Then
Document.Write "<b><" & child.nodeName _
& "></b> [#" & count & "]<br>")
count = count + 1
End If
The child.nodeTypeString
represents the node's
type as a string value, as shown in the Type column in Table 7.1.
The child.nodeType
property represents the node type
as a numeric value. An element's nodeType
value is
1, and an attribute's nodeType
value is 2. As a
result, the If
statement ensures that only elements
and attributes are shown on the screen.
The next block of code checks the current node to determine if
it's an element, and then checks it for any attributes:
If (child.nodeType = 1) Then
If (child.attributes.length > 0) Then
indent=indent+1
walk_attributes(child)
indent=indent-1
End If
End If
If the node has any attributes, as determined by the value of
the attributes.length
property, the function calls
the walk_attributes
function, passing it the current
node to generate a list of attributes. The
walk_attributes
function is discussed shortly. The
last actions that the walk_elements
function takes
are to call itself to process any child nodes and manage the
value of the indent
variable by decreasing its
value:
If (child.hasChildNodes) Then
walk_elements(child)
Else
Document.Write child.text & "
"
End If
Next
indent=indent-2
End Function
The walk_attributes
function is a lot simpler
than the walk_elements
function because attributes
use a simple structure made up a of a name-value pair, as
described in Saturday afternoon's lesson. The
walk_attributes
function is shown in the following
listing:
Function walk_attributes(node)
For Each attrib In node.attributes
For i=1 to indent
Document.Write(" ")
next
Document.Write("o--" & attrib.nodeTypeString & "")
Document.Write(": " & attrib.name & " -- {" _
& attrib.nodeValue & "}
")
Next
End Function
This function behaves in a similar way to the
walk_elements
function. It indents each line by the
number of spaces described by the value of the
indent
variable, writes out the string
representation of the nodeType
property, and writes
out the attribute's value. The function's code resides within a
For
loop that executes once for each attribute.
You can try this sample with other, relatively short XML
documents by changing the value in the parameter to the
xmlDoc.load
call in the Initialize
function.
The Microsoft .NET Framework is a set of technologies that
provide a unified programming model for traditional Windows-based
and Web-based applications, based on a comprehensive class
library that exposes system functionality. The .NET Framework
uses XML in many ways, including transferring information between
systems. The next section provides a brief introduction to the
.NET Framework and demonstrates an application that uses the C#
programming language.
XML and the .NET Framework
The .NET Framework is made up of three key elements:
- Common Language Runtime
- .NET Class Library
- Unifying Components
The Common Language Runtime is a logical layer that separates
an application from the platform it executes on, providing
execution services such as memory management, error handling, and
thread management. It abstracts the details of the operating
system, processor architecture, and interface between it and a
specific programming language. This makes it easier for
developers to create applications, and for applications to work
with one another. One of the key benefits of the Common Language
Runtime is that it supports a variety of programming languages,
including Visual C++ .NET, Visual Basic .NET, JScript .NET, and
Visual C# .NET.
The .NET Class Library provides a consistent programming model
because its functionality is accessible through all programming
languages supported by the .NET Framework. It allows developers
to easily access system functionality such as file and database
access, advanced drawing support, input and output operations,
and interoperability features, such as data interchange between
networked systems.
The .NET Framework's functionality is exposed through a set of
unifying components that include ASP.NET, Windows Forms, and
Visual Studio .NET. ASP.NET is the next generation of Microsoft's
Active Server Pages (ASP), with support for a programming model
that's familiar to developers who create traditional
Windows-based applications. ASP.NET also supports Web Services, a
new way of exposing services that are designed to be used by
people through applications and Web sites. Windows Forms is a
unified means of creating traditional Windows applications that
have a graphical user interface across all supported programming
languages. Visual Studio .NET is a development environment that's
tightly integrated with the .NET Framework, making it a great
tool for developing and deploying network-centric and
network-aware applications and services.
The .NET Framework uses XML throughout, for everything from
configuration files to native support for XML in ADO.NET (a data
access technology) to interchanging information with other
systems. The .NET Class Library provides many classes that work
with XML, making it easier to create applications that are
XML-aware.
The example for this discussion is a traditional Windows-based
application, written in Visual C# .NET, that reads an XML
document and displays it in a Windows Forms TreeView
control, as shown in Figure 7.5.
Figure 7.5 - A Windows Forms application showing the
contents of an XML document.
Your system must have the .NET Framework installed on it to
compile and work with this application. The .NET Framework is
available for free from Microsoft's MSDN Web site at
http://msdn.microsoft.com. The system requirements are posted
there as well. Please review them before you download the .NET
Framework, because it's a large download. If you already have the
.NET Framework installed but don't have Visual Studio .NET, I've
provided a compiled version of the sample code along with the
book's source code distribution. You should be able to just start
the application to try it out. The name of the application is
xmlTreeView, and it's located in the
\XMLInAWeekend\chapter07\dotNET folder. The name of the
application's executable is xmlTreeView.exe.
The application's controls are straightforward. Click on the
button to open a file selection dialog box and initiate reading
the XML document you want to view. Use the Expand Tree check box
before you select a file to have its contents expanded in the
tree view automatically when the file is loaded. Expanding all
elements automatically can take a long time if the document is
large or has a complex structure, so use this feature with
caution. One feature of the application that's not readily
apparent is that you can drag the left and bottom borders of the
window, which drags the left and bottom edges of the tree view
control along with it. This makes it possible to view longer or
wider XML documents in the tree view control without having to
scroll sideways or up and down.
Unlike the pervious example, which uses the XML DOM to explore
the structure and content of an XML document, this example uses a
forward-only, read-only stream of data to quickly read through
the XML document. (If you're familiar with XML processor
programming models, this programming model is similar to the one
offered by SAX, the Simple API for XML. The difference is that
the parser doesn't raise events as it reads the XML document.
Instead, the application informs the parser when to read the next
block of information from the XML document.) The application
displays the document in a
System.Windows.Forms.TreeView
control, which makes
it easy for users to inspect the document using a familiar
interface while using relatively little space on the screen.
The code that manages the display is rather involved and is
beyond the scope of this book, so I won't cover it here. I'll
describe most of the code as it would execute when you run the
application. The code is a lot simpler than the previous example,
because its focus is on reading the XML document and adding
information to the tree view control. The .NET Class Library
handles the rest of the details.
The application begins by presenting what's referred to as a
file dialog box that allows the user to select which file to
view. The following listing shows how the application creates an
instance of the dialog and works with it:
OpenFileDialog oFileDlg = new OpenFileDialog();
oFileDlg.Filter = "XML files (*.xml;*.xsl)|*.xml;*.xsl|All files (*.*)|*.*";
oFileDlg.FilterIndex = 1;
oFileDlg.RestoreDirectory = true;
populateTreeView(oFileDlg.FileName);
Visual C# .NET is similar to C++, making it very easy to
learn. The code in the preceding listing simply captures the name
of the file that the user wants to view and passes it onto
another function called populateTreeView
, which is
where the core functionality of the application resides.
The role of the populateTreeView
function is to
read XML data from the XML document and transfer that information
to the TreeView
control for display. The function
uses a System.Xml.XmlTextReader
object that provides
fast, forward-only access to an XML document without validating
it as it reads the document. The first thing the function does is
open the XML document based on a
System.IO.FileStream
object, as shown in the
following listing:
XmlTextReader xmlReader;
fileStreamObject = new FileStream(strFile, FileMode.Open, FileAccess.Read);
xmlReader = new XmlTextReader(fileStreamObject);
The code reads through the XML document using a loop that
continues as long as XmlTextReader
is able to
successfully read information from the XML document. The
application uses a switch
statement to determine
what type of node it's working with, because some nodes, like
comments, CDATA sections, and processing instructions, are
displayed a little differently than element nodes. The following
listing presents the part of the application that handles element
nodes:
switch(xmlReader.NodeType)
{
case XmlNodeType.Element:
xmlNode = new TreeNode("<" + xmlReader.Name + ">");
emptyElement = xmlReader.IsEmptyElement;
while(xmlReader.MoveToNextAttribute())
{
TreeNode attNode = new TreeNode("Attribute");
attNode.Nodes.Add(xmlReader.Name + "='" + xmlReader.Value + "'");
xmlNode.Nodes.Add(attNode);
}
continue;
}
xmlTree.Nodes.Add(xmlNode);
This application creates an instance of a
TreeNode
object that contains the name of the
element, along with its attributes. Attributes are added as
children of the element's node to take advantage of the
TreeView
control's display capabilities. Just before
moving onto the next node, the code adds the
TreeNode
object it created earlier to the
TreeView
control (the last line of the listing).
This concludes the brief tour of the .NET Framework and
Windows Forms applications. If you're interested in learning more
about the .NET Framework or the various programming languages
used in this lesson, refer to the Resources section of the book
for some Web site addresses of interest, or visit my Web site
(http://www.designs2solutions.com). I have the same resources,
but I keep the links up to date in case they change.
Summary
You've come a long way this weekend, and now you're ready to
use your newly acquired understanding of XML and its related
technologies. XML technologies are rapidly changing to meet the
changing needs of industry. You should try to keep up with the
latest developments by regularly checking the World Wide Web
Consortium's Web site (http://www.w3c.org) and perhaps subscribing to some
of the excellent newsletters that are available from the major
XML portal sites. XML is a great technology that's working its
way into all facets of the computing industry, and it reflects
the industry's drive to address some of its long-standing
problems using an open, publicly available standard.