(untagged)

XMLFoundation

Brian Aberle

0.00/5 (No votes)

2 Jul 2002

Obtaining data marked up in XML creates the need for Application Layer tools to easily and efficiently work with XML data.

Download source files - 1.09 Mb

The Need for Objects from XML

Obtaining data marked up in XML creates the need for Application Layer tools to easily and efficiently work with XML data. The brute force approach is to parse the XML into a DOM tree, and traverse the tree to gather the data required by application/object variables. This approach causes volumes of “simple” source code to move data from XML into Structured Objects, a poor approach with respect to implementation time and long term maintenance.

Alternatively, the OO approach generalizes this process into reusable functionality that enables objects to serialize to and from XML directly. Many implementations along this thought process have been published. (ASP,Visual Basic, More VB , Java, More Java, IBMJava, Perl, Python, Delphi, PHP). (+about 30 more) It's clear that software developers of every language have a similar need. They must either:

Write their own Object-XML tools,
Find some production quality framework ready to use or
Use brute force and hire a team of maintenance programmers.

If your application does not use XML

This foundation may still be an excellent tool to ensure performance, portability, and simplify development. The foundation includes a wide variety of general utility classes (list, stack, hash, tree, sort, string, file-structure, exception, compression, ciphering, uuencode + more) that are all portable, comment-documented, optimized, and easy to use. The string class for example is designed to be easily used with XML strings but works in any application. FiveLoaves is an example of an application that makes extensive use of the XMLFoundation, but version 1.0 does nothing with XML. If (most likely when) FiveLoaves ever needs any XML support it will be a simple enhancement, and a natural step forward - not a retrofit.

Something Old and Something New: Old Problem - New Data Format

“Objects from data” is not a new concept. Programmers have been doing that for years, even before they were called objects. For a while the industry experimented with Object Databases because they made the job of building the application much simpler. Object Databases were a good idea, but not practical on a large collection of information - because performance is usability. The database stayed fast and relational while Object Databases went the way of other ‘cool’ technologies that were never embraced. Programmers added volumes of simple logic to move data from column-row output into usable memory structures (objects). Basically, programmers were hand coding what the Object database could not do fast enough. A variety of tools and development frameworks became available to developers of all languages. These tools eased the mundane task of moving row/column data into structures (objects) that are necessary in the application layer.

Native Component Object Support

CORBA and COM objects have the same problem as all other objects in all other development languages. This framework has support for both types of distributed object architectures. A native XML accessor added to every object opens doors to a whole new dimension of development technique. The Object technology you choose becomes a secondary issue with respect to inter-operability since you have an alternate interface of direct XML no matter which technology you choose.

The Need for Speed

XML's beautifully readable verbosity comes with a price. Compare these these protocols displaying the same information marked up different ways:

123456789112345678921234567893123456789412345678  Measure
.........0.........0.........0.........0........ 
<Name>Brian</Name><EmployeeID>11260</EmployeeID>  XML
Brian,11260                                       Comma Separated
##Brian##                                         Packed Binary 
123456789112345678921234567893123456789412345678 
.........0.........0.........0.........0........

You can see all the markup in red that is added to the two pieces of data in blue. The data measures in at 10 bytes but can be represented in as few as 7 bytes.

Marked up type	Message Size
XML	48 bytes
Positional Comma Separated	11 bytes.
Packed Binary	9 bytes. (## is the name length, ## is a packed employeeID )

Generally it takes quite a bit more space to use XML over other object transport protocols. That means:

The data took longer to assemble.
It took longer to get to you because more data was sent.
It's going to take longer to process more information.

Time is relative, but central to technology. Time in the lower layers gets magnified exponentially in the upper layers. What we see as upper layers today will be buried by future advancements until the end of time. Someone once looked at the upper layers in 1984 and said "640k ought to be enough for anybody", the foundation was laid, and the building began. By 1992 the foundation had been rebuilt. It was "New Technology" then. So the more appropriate question is "Where do you want to be tomorrow?". Mozart fans who watch technology commercials might understand. What we call broad-band today, we'll call a bottle neck tomorrow. It's just a matter of time.

Implementation Response: Work everywhere.

Since "Objects To/From XML" is such a general task, the implementation uses are nearly countless. The framework is designed to work in as many environments as possible.

Application General: XML used for object data transport is well formed. This works well for a real time information server and/or Client/Server(aka: B2C) implementations, as well as B2B. In B2B integrations, DTD Validation happens before the inbound-XML Document ever makes it into the application layer where objects are created. In situations like B2B where you do want to validate the XML against DTD, this is done as a preliminary step rather than in the object factorization. Many good validating XML parsers are available for free. UBT uses the Xerces parser for these "high-level" tasks. (History on Xerces)

Performance: The lowest common denominator in application technology requirements is to meet the performance requirements of the most demanding implementation. That ideal affects numerous design decisions at every level of the framework from algorithm selections to memory management. Care for ideal memory usage becomes a greater issue with XML. The end result of a single 'extra' or 'temp' copy of the data will be a server that has twice the memory requirements due to software design.

Size: Smaller is always better, but in embedded systems small is often a forefront requirement. The XMLFoundation is "tight" code. Consider FiveLoaves, that application uses the foundation extensively, nearly every utility and algorithm is put to use and it compiles into a 350kb executable (one-fourth the capacity of a floppy disk - uncompressed). Many internet applications install or update through internet connections, for them extra size is extra wait time for the user or customer. In the Internet-Age, the company with the best service is often the company with the best technology.

International: UBT will release a unicode implementation based on IBM's ICU, the same (free) library the Java VM liks with to provide unicode support. It will remain as a separate library build so that all the non Asian/Arabic implementations are not forced to incur the overhead of translation, and double memory consumption.

Portability

There are many considerations to true portability, it's the purpose of this framework to address all of them without compromising performance.

Java applications are easier to port across hardware platforms for the fact that only a single build need be managed. Within the scope of Java, portability also includes support of existing development technologies like CORBA, JavaScript and JSP. Not all Application Server technologies are created equal. EJB's may not load native libraries but their containers may if they are implemented to do so. In some implementations Beans may indirectly 'use' objects that have native implementations.

C++ has been maturing for about 15 years and is now supported almost everywhere. It's certainly everywhere Java is since the JVM is written in C/C++. Some C++ compilers still have weak support for late coming additions to C++ such as templates. For this reason, the C++ framework is strictly split into two layers. The lower layer is void. Template classes are not used by the framework, but by the application, so their inclusion is optional. If your compiler does not support templates - don't use the template classes. The "ObjectsFromXML" example included with the framework shows both ways.

Porting C++ is more complex than porting Java. Since C++ compiles to a direct machine binary the source code must be passed through a compiler on each target system that you want to support. So to a C++ consumer, considering portability: No STL is used. No Namespaces are used. System includes have been minimized to only include standard "K&R C" includes. No iostreams are used. The framework has been used on a variety of platforms including: HPUX, AIX, SunOS, 3.1/95/98/ME/NT/2K, Linux and even proprietary systems. It's been verified with many compilers including: CC5.0, Xlc, gcc(2.95 & 3.0), IntelC++5.0, KAIc++4.0f, ForteC++(6.2 & 7), and Visual C++6.sp5

Like Java, C++ consumers also must consider support for Application/Object technologies like CORBA and COM. The framework ships with an example that has been ported to IONA/Orbix, BEA/WLE-ObjectBroker, and Inprise/Borland Visibroker in the CORBA example.

A Firm Foundation

This source code has been the basis of several large customer enterprise scale jobs done by UBT. It is also the foundation for FiveLoaves, DesignerXSL, DesignerXML, TransactXML and several of our own internal projects. It's been a full-time work in progress since January 1998. Several of the basic utilities like the list are from 1992 - they have been tried and tested over the years on many software projects.

Where's the source of XML for the objects?

The purpose of this framework is to provide object serialization for XML. The source of XML can be the state of another object, a disk file, or a memory buffer. Many products like Oracle and SQL Server are beginning to support XML as a way to describe the data that they contain, those products will provide an acceptable source of "Dynamic XML" to assign the state of your objects. You can also use an XML server like UBT's TransactXML Server or any other server or application that can produce a well formed XML document. One of the example programs, "ObjectsFromXML" shows how to use an alternating data source of either a disk file or an Dynamic XML document created from UBT's TransactXML Server.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here