(untagged)

Native XML Databases: Why Should You Care?

Prateek Kathpal

30 May 2008

In this paper, learn how you can store XML documents in an integrated, scalable, high-performance, object-oriented native XML database and take advantage of fast access to every element of an XML document regardless of the number of concurrent users, the number of documents, or the database size.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

XML is becoming an increasingly common format for organizations to express the structure of their complex or diverse content. It enables ease of authoring, making it ideal for companies seeking to develop flexible applications. Storage, manipulation and handling of XML has always been an issue and relational databases cannot scale enough to help solve these problems.

XML databases (also called native XML databases) have functionality that significantly improves the management and manipulation of XML, a data format initially designed to facilitate the sharing of structured data across different information systems, particularly over the internet.

XML databases benefit from standards developed by the World Wide Web Consortium (W3C). The standards create a powerful architecture for connecting XML data management services to commonly available application frameworks. The XML databases that use and support these standards provide capabilities not found in other database technology (such as relational databases), including efficient access, query, storage, and processing. XML standards and the databases that support them enable a powerful platform for managing even the most complex and demanding content applications.

EMC® Documentum® XML Store OEM Edition: a Powerful Native XML Database

EMC® Documentum® XML Store OEM Edition is designed for software developers who require advanced XML data processing and storage functionality within their applications. XML Store enables high-speed storage and manipulation of very large numbers of XML documents. Using XML Store, programmers can build custom XML content management solutions and store XML documents in an integrated, highly scalable, high-performance, object-oriented database. The comprehensive XML Store Java API contains methods for storing, querying, retrieving, transforming, and publishing XML data. XML Store can run on any Java platform (JDK 1.5 or higher).

XML Store provides numerous features to help you process and handle XML documents, including:

An XQuery engine for retrieving specific parts of a document
A versioning mechanism for tracking differences within your XML data
Various indexing methods to optimize access to frequently used XML data, and to enable full text search
A transformer and formatter for publishing XML data in XHTML or PDF
An improved cache page replacement algorithm for enhanced performance

You can get more information on XML Store at this location.

Powered by Open Standards

XML Store uses and supports all XML standards including XML 1.0, XQuery, XML Schema, XPath, XSL, XPointer, XLink, XUpdate and DOM. These W3C standards can be used to access, search, process, and store XML data.

XML Store supports for XML standards delivers powerful benefits to content creation and delivery. For example, data can be captured, edited, and rendered throughout a heterogeneous environment. Additionally, documents can be created and managed to be reused in various user manuals and repurposed for different output formats in environments such as manufacturing. In these environments with heavy document component sharing and reuse, the standards enable strong version control. The best news is: your current knowledge of XML standards makes XML Store easy to use so you’ll be quickly up and running.

Simplicity & Ease of Use

To get started, you can download and install a trial version of XML Store Native XML Database product from this location.

Creating a Database

The first step after installing XML Store is creating a database. The easiest way to do this is through the AdminClient, as follows:

Start the AdminClient by running XHAdmin, which is located in the bin subdirectory of the XML Store target directory, and in the Start menu group of XML Store on Windows.
Create a database by selecting the menu option Databases->Create database.
Create a database named united_nations with the super user password as entered during installation of XML Store and administrator password northsea. This database will be used by default in all samples described in this manual.

After creating the database, you can close the administrator client. Note that you can also use the administrator client to view the data after you have run the samples, and to perform other actions. Alternatively, the command line tool XHCreateDB can be used to create databases.

Running a Sample

XML Store samples are run using the ant build system. A command line tool called xhive-ant is provided which sets the proper CLASSPATH and other parameters. The samples are run as follows:

Open a command prompt and go (cd) to the XhiveDir\bin directory.
To run a sample which inserts two documents into the database, enter the command: xhive-ant run-sample -Dname=manual.StoreDocuments
On successful completion of the sample, a message appears stating the number of documents stored in the database.

The sources of all the samples can be found in XhiveDir\src\samples\manual. You should check the values of the properties in SampleProperties.java, so that they match your settings, before running the samples.

Parse XML Documents

To import an XML document from an external source, the XML document needs to be parsed. You can parse documents using the parseURI method of the DOM Load / Save LSParser interface.

The XhiveLibraryIf interface extends DOMImplementationLS, which can be used to create LSParser and LSSerializer objects. You must create LSParsers on the library where you want to store the document.

When parsing succeeds, a DOM Document is returned.

LSParser builder = rootLibrary.createLSParser();
Document firstDocument = builder.parseURI( new File(fileName).toURL().toString());

To store a parsed document in the database, you also need to perform an explicit appendChild. Otherwise, the document is only parsed and not stored.

The parse documents sample uses the default LSParser configuration settings.

XML Store supports the DOM Load/ Save specification, which provides standard ways for parsing and serializing DOMs.

Store XML Documents

To store XML documents in an XML Store database, use the appendChild() method. Before you can use the appendChild() method, you need to get a handle to the library where the document should be stored. Every database has a root library by default. To store a document in the root library of the sample database, you could use the following code:

XhiveLibraryIf rootLibrary = united_nations_db.getRoot();
rootLibrary.appendChild(firstDocument);

Alternatively, you can store a document using the insertBefore() method, which is also a standard DOM method. The second parameter to specify with insertBefore() is the document in front of which you want to insert the new document:

rootLibrary.insertBefore(secondDocument, firstDocument);

Import Non-XML Data

XML Store can import data from non-XML files, provided you supply information on how the data fields are separated and arranged in the source file. The com.xhive.util.interfaces.XhiveSqlLoaderIf interface contains the methods used for importing non-XML data. For a detailed specification of these methods, refer to the XhiveSqlLoaderIf Javadoc.

In this example, data is imported in CSV format into XML Store, and stored as an XML document. The data to import looks like this:

"Member", "Date of Admission", "Additional Notes"
"Iceland", 19 Nov. 1946, ""
"India", 30 Oct. 1945, ""
"Indonesia", 28 Sep. 1950, "By letter of 20 January ..."
"Iran (Islamic Republic of)", 24 Oct. 1945, ""

You can import the data with an XhiveSqlLoaderIf object, using the loadSqlData() method:

Document un_members_doc = loader.loadSqlData(Impl,
    FileName,
    ',',
    '\\',
    '"',
    true,
    "UN_members",
    XhiveSqlLoaderIf.IGNORE_HEADER,
    "member",
    new String[] {"name","admission_date","additional_note"},
    new Boolean[] {false, false, false});

Create Documents

XML data is stored in XML Store databases as documents. A document is represented in the XML Store API by the org.w3c.dom.Document interface. This interface contains a number of methods for creating a new XML document, updating (parts of) XML documents, and accessing parts (elements, comments, attributes, and so on) of the document.

To create a new document:

Obtain a handle to a DOM implementation (through rootLibrary as XhiveLibraryIf extends DOMImplementation):
```
DOMImplementation impl = rootLibrary;
```
Create a DocumentType and a Document using the createDocument() method in org.w3c.dom.DOMImplementation:

DocumentType docType = 
    impl.createDocumentType("typeName", "publicId", "systemId");

```
Document eventsDocument= impl.createDocument(null, "events", docType);
```
Because no namespaceURI is used, the first parameter can be left empty. The second parameter of createDocument(), events, is the (tag) name of the root element. The third parameter sets the docType of the new document.
Obtain a handle to the root element of the newly created document:

Element rootElement = eventsDocument.getDocumentElement();

You can now add document parts using standard DOM methods. These methods are located in the org.w3c.dom.Document interface. The most commonly used methods are:
- createAttribute()
- createComment()
- createElement()
- createTextNode()

The following code adds a comment, an element named event with attribute occurrence, and a (text) value "UNICEF, Executive Board, annual session" to the new document:

// add a comment to the document before the root element
Comment comment = eventsDocument.createComment("this document contains UN 
events");
eventsDocument.insertBefore(comment, rootElement);

// add a new element to root element
Element eventElement = eventsDocument.createElement("event");
rootElement.appendChild( eventElement );

// add text value to the element
Text eventText = 
    eventsDocument.createTextNode("UNICEF, Executive Board, annual session");
eventElement.appendChild(eventText);

// add an attribute to the element
eventElement.setAttribute("occurrence", "year");

To add an element date with value "4-8 June, 2001" to element event, use the following code:

// add a new element to event
Element dateElement = eventsDocument.createElement("date");
eventElement.appendChild( dateElement );

// add text value to the date element
Text dateText = eventsDocument.createTextNode("4-8 June, 2001");
dateElement.appendChild(dateText);

Here is the resulting XML document:

<!DOCTYPE typeName PUBLIC "publicId" "systemId">
<!--this document contains UN events-->
<events>
    <event occurrence="year">
        UNICEF, Executive Board, annual session
        <date>4-8 June, 2001</date>
    </event>
</events>

As usual, to actually store the document in the database you need to use the appendChild() or insertBefore() method:

rootLibrary.appendChild(eventsDocument);

Executing Queries

You can execute an XQuery query using the executeXQuery(String query) method on the XhiveNodeIf interface. It return, an iterator that represents the result sequence. Each element of the result is an instance of XhiveXQueryValueIf. In XML Store for Java 5 (JDK 1.5 and later), all iterators and results are also typed using the new template syntax (e.g. executeXQuery(String) returns an Iterator<XhiveXQueryValueIf>).

XhiveNodeIf lc = ... ;
Iterator result = lc.executeXQuery("doc('doc')//item");
while (result.hasNext()) {
    XhiveXQueryValueIf value = (XhiveXQueryValueIf)result.next();
    // We know this query will only return nodes.
    Node node = value.asNode();
    // Do something with the node ...
}

Within the query, the context item (accessible via .) is initially bound to the node the query was executed on. Example (Java 1.5 syntax):

XhiveNodeIf node = ...;
Iterator<XhiveXQueryValueIf> result = node.executeXQuery("./author/first, 
./author/last, ./contents");
for (XhiveXQueryValueIf value : result) {
    // do something with the value ...
}

If you only want to display the result, you can use the toString() method on the values returned, regardless of their type:

XhiveLibraryChildIf lc = ... ;
String query = ... ;
Iterator result = lc.executeXQuery(query);
while (result.hasNext()) {
    System.out.println(result.next().toString());
}

If the query uses node constructors, any nodes created are created in a temporary document. If desired, these nodes can be inserted into another document using the DOM importNode() method.

If you want to insert the nodes into a particular document, you can specify an owner document for new nodes in the call. This is more efficient than creating a temporary document and importing its nodes into the destination document.

XhiveLibraryChildIf lc = ... ;
XhiveDocumentIf doc = ... ; // Create new nodes in this document
Iterator result = lc.executeXQuery("<count>{count(//item)}</count>", doc);
// We know this query will only return a single node.
XhiveXQueryValueIf value = (XhiveXQueryValueIf)result.next();
Node node = value.asNode();
// Append it to the document element of destination document
doc.getDocumentElement().appendChild(node);

The query result is evaluated lazily, i.e. each time you call next() on the result iterator. Beware not to call result.next() after a modification (in the same session) of the searched documents or libraries. If you do this, undefined results may occur. If you want to use the query output to modify the searched documents, use the xhive:force() function or the update syntax (see below).

Full Text Searching

X-Hive extends XQuery with a full text search function. The full text search function can be used to search for terms within a text string. In general, you can think of "terms" as words. For example, the string, "yadda yadda yadda" contains three terms, each with the value "yadda". In X-Hive terms are the basic units for full text indexing and searching. This is different from, for instance, the contains function in XQuery that considers the text as a single monolithic string. Using the full text search function has a number of advantages, since the full text search function:

Looks upon the input string as a list of terms, instead of a list of characters like the contains function does. This makes the usage of indexes more practical. Currently no indexes are used when using the contains function. The full text search, however, can use indexes.
Allows usage of wildcards and prefixes.
Allows you to search for exact or sloppy phrases.

The full text search function is declared as follows:

xhive:fts(node(s), querystring, options)

The first argument of the function should be a (set of) node(s). The second argument is expected to be a query string. If a set of nodes is provided as the first argument, the full text search is executed on all the nodes. The options argument is optional and (if present) should be a string literal containing a semicolon-separated list of options.

The result is a Boolean value returning true if the text matches the query.

There is an option called include-attrs. With this option, when you execute the function on elements, the attribute values of that element (and descendants) are evaluated along with the other text nodes. Note that if you want to use the include-attrs option in combination with full text indexes, you must use the FTI_INCLUDE_ATTRIBUTES option on that index (and vice versa, if you do not use include-attrs, the option may not be set on the index).

Full Text Search Query Syntax

The syntax of full text search queries is as follows:

Query ::= Clause ( [ Conjunction ] Clause ) *

Conjunction ::= 'AND' | 'OR' | '||' | '&&'

Clause ::= [ Modifier ] BasicClause [ Boost ]

Modifier ::= '-' | '+' | '!' | 'NOT'

BasicClause ::= ( TermQuery | Phrase | '(' Query ')' )

TermQuery ::= ( Term | WildCardTerm | PrefixQuery ) [ Fuzzy ] 

PrefixQuery ::= Term '*'

Phrase ::= '"' Term * '"' [ SlopFactor ]

Fuzzy ::= '~' 

SlopFactor ::= '~' DecimalDigit+

Boost ::= '^' DecimalDigit+ '.' DecimalDigit+

Term ::= <a-word-or-token-to-match>

WildCardTerm ::= <a-word-or-token-to-match-with-wildcards>

The following characters are reserved and need to be escaped with a backslash (\) if used without any special meaning: +, -, !, (, ), :, ^, [, ], ", {, }, ~, *, ?

Conclusion

You can use XML Store in various architectures such as stand-alone, client-server, and SOA. XML Store runs on any Java-supported platform (JDK1.4 or higher) including Sun Solaris, HP UX, Linux, Windows, and Macintosh. To ensure smooth and rapid integration with existing applications and systems, XML Store provides:

An interface to relational databases
Bridges to XML editors and full-text search engines
Support for J2EE and WebDAV

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here