The Need for Objects from XML
Obtaining data marked up in XML creates the need for
Application Layer tools to easily and efficiently work with XML data. The brute force approach is to parse the XML
into a DOM tree,
and traverse the tree to gather the data required by application/object
variables. This approach causes volumes
of simple source code to move data from XML into Structured
Objects, a poor approach with respect to implementation time and long term
maintenance.
Alternatively, the OO
approach generalizes this process into reusable functionality that enables
objects to serialize to and from XML directly. Many
implementations along this thought process have been published. (ASP,Visual
Basic, More
VB,
Java,
More Java, IBMJava,
Perl,
Python,
Delphi,
PHP).
(+about
30 more) It's clear that software developers of every language have a similar need.
They must either:
- Write their own Object-XML tools,
- Find some production
quality framework ready to use or
- Use brute force and hire a team of maintenance
programmers.
If your application does not use XML
This foundation may still be an excellent tool to ensure
performance, portability, and simplify development. The foundation
includes a wide variety of general utility classes (list, stack, hash, tree,
sort, string, file-structure, exception, compression, ciphering, uuencode
+ more) that are all portable, comment-documented, optimized, and easy to
use. The string class for example is designed to be easily used with XML
strings but works in any application. FiveLoaves
is an example of an application that makes extensive use of the XMLFoundation,
but version 1.0 does nothing with XML. If (most likely when) FiveLoaves
ever needs any XML support it will be a simple enhancement, and a natural step
forward - not a retrofit.
Something Old and Something New: Old Problem - New Data Format
Objects from data is not a new concept. Programmers have been doing that for years, even before they were called
objects. For a while the industry
experimented with Object Databases because they made the job of building the
application much simpler. Object
Databases were a good idea, but not practical on a large collection of
information - because performance is usability. The database stayed fast and relational while Object Databases
went the way of other cool technologies that were never embraced.
Programmers added
volumes of simple logic to move data from column-row output into usable memory structures (objects).
Basically, programmers were hand coding what
the Object database could not do fast enough. A variety of tools and development frameworks became available to
developers of all languages. These
tools eased the mundane task of moving row/column data into structures (objects)
that are necessary in the application layer.
Native Component Object Support
CORBA
and COM
objects have the same problem as all other objects in all other development
languages. This framework has
support for both types of distributed object architectures. A native XML
accessor added to every object opens doors to a whole new dimension of
development technique. The Object technology you choose becomes a secondary
issue with respect to inter-operability since you have an alternate interface of
direct XML no matter which technology you choose.
XML's beautifully readable verbosity comes with a price.
Compare these these protocols
displaying the same information marked up different ways:
123456789112345678921234567893123456789412345678 Measure
.........0.........0.........0.........0........
<Name>Brian</Name><EmployeeID>11260</EmployeeID> XML
Brian,11260 Comma Separated
##Brian## Packed Binary
123456789112345678921234567893123456789412345678
.........0.........0.........0.........0........
You can see all the markup in red
that is added to the two pieces of data in blue.
The data measures in at
10 bytes but can be represented in as few as 7 bytes.
Marked up
type
|
Message Size
|
XML
|
48 bytes
|
Positional Comma Separated
|
11 bytes.
|
Packed
Binary
|
9 bytes. (## is the name length, ##
is a packed employeeID )
|
Generally it takes
quite a bit more space to use XML over
other object transport protocols. That means:
- The data took
longer to assemble.
- It took longer to get to you because more data was sent.
- It's going to take longer to process more information.
Time is relative, but central
to technology. Time in the lower layers gets magnified
exponentially in the upper layers. What we see as upper layers today will
be buried by future advancements until the end of time. Someone once
looked at the upper layers in 1984 and said
"640k ought to be enough for anybody", the foundation was laid, and
the building began. By 1992 the foundation had been rebuilt. It was "New
Technology" then. So the more appropriate question is "Where do you want to be
tomorrow?". Mozart fans who watch technology commercials might
understand. What we call broad-band today, we'll call a bottle neck tomorrow.
It's just a matter of time.
Implementation Response:
Work everywhere.
Since "Objects To/From XML" is such a general
task, the implementation uses are nearly countless. The framework is
designed to work in as many environments as possible.
Application General: XML used for object data transport is well
formed. This works well for a real time
information server and/or Client/Server(aka: B2C) implementations, as well as
B2B. In B2B integrations, DTD
Validation happens
before the inbound-XML Document ever makes it into the application layer where objects are
created. In situations like B2B where you do want to validate the XML against DTD,
this is done as a preliminary step rather than in the object factorization. Many good validating XML parsers are available for
free. UBT uses the Xerces
parser for these "high-level" tasks. (History
on Xerces)
Performance: The lowest common denominator in
application technology requirements is to meet the performance requirements of
the most demanding implementation. That ideal affects numerous design
decisions at every level of the framework from algorithm selections to memory
management. Care for ideal memory usage becomes a greater issue with
XML. The end result of a single 'extra' or 'temp' copy of the data will be
a server that has twice the memory requirements due to software design.
Size: Smaller is always better, but in embedded systems
small is often a forefront requirement. The XMLFoundation is
"tight" code. Consider FiveLoaves, that application uses the
foundation extensively, nearly every utility and algorithm is put to use and it
compiles into a 350kb
executable (one-fourth the capacity of a floppy disk - uncompressed). Many internet
applications install or update through internet connections, for them extra size
is extra wait time for the user or customer. In the Internet-Age, the
company with the best service is often the company with the best technology.
International: UBT will release a unicode implementation based on IBM's
ICU, the same (free) library the Java VM liks
with to provide unicode support. It will remain as a separate library build so
that all the non Asian/Arabic implementations are not forced to incur the overhead of translation, and
double memory consumption.
Portability
There are many considerations to true portability, it's the
purpose of this framework to address all of them without compromising
performance.
Java
applications are easier to port across hardware platforms for the fact that only a single
build need be managed. Within the scope of Java,
portability also includes support of existing development technologies like
CORBA, JavaScript and JSP. Not all Application
Server technologies are created equal. EJB's
may not load native libraries but their containers may if they are implemented
to do so. In some implementations Beans may indirectly 'use' objects that
have native implementations.
C++ has been maturing for about 15 years and is now
supported almost
everywhere. It's certainly everywhere Java is since the JVM is
written in C/C++. Some C++ compilers still have weak support for late coming
additions to C++ such as templates. For this reason, the C++ framework is
strictly split into two layers. The lower layer is void. Template classes are not used by the framework,
but by the application, so their inclusion is optional. If your compiler does not
support templates - don't use the template classes. The "ObjectsFromXML"
example included with the framework shows both ways.
Porting C++ is more complex than porting Java. Since
C++ compiles to a direct machine binary the source code must be passed through a
compiler on each target system that you want to support. So to a C++
consumer, considering portability: No STL is used. No Namespaces are used. System includes have been minimized
to only include standard "K&R C" includes. No iostreams are used. The framework has been used on a variety of platforms including:
HPUX, AIX, SunOS, 3.1/95/98/ME/NT/2K, Linux and even proprietary systems.
It's been verified with many compilers including: CC5.0, Xlc, gcc(2.95
& 3.0), IntelC++5.0,
KAIc++4.0f, ForteC++(6.2
& 7), and Visual
C++6.sp5
Like Java, C++ consumers also must consider support for
Application/Object technologies like CORBA and COM. The framework ships
with an example that has been ported to IONA/Orbix, BEA/WLE-ObjectBroker, and
Inprise/Borland Visibroker in the CORBA
example.
A Firm Foundation
This source code has been the basis of several large
customer enterprise scale jobs done by UBT. It is also the foundation for FiveLoaves,
DesignerXSL, DesignerXML,
TransactXML and
several of our own internal projects. It's been a full-time work in progress
since January 1998. Several of the basic utilities like the list are from
1992 - they have been tried and tested over the years on many software projects.
Where's the source of XML for the
objects?
The purpose of this framework is to provide object
serialization for XML. The source of XML can be the state of another
object, a disk file, or a memory buffer. Many products like Oracle and SQL
Server are beginning to support XML as a way to describe the data that they
contain, those products will provide an acceptable source of "Dynamic
XML" to assign the state of your objects. You can also use an XML
server like UBT's TransactXML Server or any other server or application that can
produce a well
formed XML document. One of the example programs, "ObjectsFromXML"
shows how to use an alternating data source of either a disk file or an Dynamic
XML document created from UBT's TransactXML Server.