Browsing XML/XSL/XSD with HTA/Scripting Runtime

Dmitry Khudorozhkov

4.61/5 (16 votes)

7 Jun 200612 min read

1.4K

This article presents an XML/XSL/XSD browsing and validation tool, a shining example of how various technologies like Shell scripting, Scripting Runtime, or HTA can be put together to aid programmers in rapid development of powerful script-based applications.

Download browser HTA - 18.2 Kb

What is it about

The initial idea of this article was to bring you a simplistic example of using a Scripting Runtime Library, "click here and here, blah-blah-blah, thank you". The reason why I began writing was the need to make my and my colleagues' scripts file-system-aware. This ability proved to be very useful for software prototyping purposes or for building some small utilities; of course, it shouldn't be used on web (due to security/privacy reasons we'll discuss later).

Back to the tool. Prior to writing it, I was deep in XML for several months, so what you see here is an XML/XSLT viewer/browser, enveloped in the form of an HTML application. It helped me a lot when I was learning XML/XSL, now it aids <some other people> in rapid checking and bug tracking of large numbers of XSL templates; hope it helps you too.

Of course, this little browser (I'll call it "Xbrowser" further on) is in no way a replacement for any enterprise-grade development tool. It is just:

an interactive learning tool that illustrates the basics of XML handling with JScript/MSXML - for beginners in XML development; and, maybe -
an example of using the Microsoft Scripting Runtime Object Library - for developers of Office tools and solutions; and, of course -
a simple utility for validating XML documents (for well-formness/against a schema) and viewing XSLT output.

The last minute additions to this article were XML/XSL transformation and XML/XSD validation tools, which use (most of) the techniques described here.

Requirements

For this piece of code to work properly, you'll need the following code packs:

Common Dialog ActiveX Control - provides a standard set of dialog boxes for operations such as opening and saving files, setting print options, and selecting colors and fonts. It is shipped with MS Visual Basic and the MS Office 2000/XP products, or can be downloaded from the Microsoft website.
Scripting Runtime Object Library is a Microsoft file system management solution, designed for use with scripting languages; it is an integral part of Microsoft Office 2000/XP. This library is also available for download at the Microsoft website.
Microsoft XML Core Services and/or SDK (versions 3.0 or, preferably, 4.0). Can be downloaded from the Microsoft website.
For installing some of the previously mentioned packages, you'll probably need a CAB extraction utility. You can download it from the Microsoft site.

How stuff works

If you look inside the attached archive, you'll see that the "Xbrowser" is no more than an HTML form. Let's see how to use it, and how the code works behind the stage, step by step.

Folder browsing

Step 1: choose a folder where your XML files are located.

This part uses the Shell object, specifically its BrowseForFolder method.

JavaScript

function BrowseFolder()
{
 // Accessing the Shell:
 var objShell = new ActiveXObject("Shell.Application");

 // Calling the browser dialog:
 var objFolder = objShell.BrowseForFolder(0, "Select a folder:", 0);

 if (objFolder == null) return "";

 // Accessing the folder through FolderItem object:
 var objFolderItem = objFolder.Items().Item();
 var objPath = objFolderItem.Path;

 var foldername = objPath;
 if (foldername.substr(foldername.length-1, 1) != "\\")
     foldername = foldername + "\\";

 // foldername is the actual folder name.

 ...
}

File browsing and enumeration.

Step 2: choose a file.

Two interesting things here:

The Scripting.FileSystemObject is the main point of access to the file system. In short:

`FileSystemObject` contains:	`Drives` collection `Folders` collection `Files` collection `GetDrive` method (access a particular drive). `GetFolder` method (access a particular folder). `GetFile` method (access a particular file).
`Drives` collection contains:	`Item` property (used to access the drive). `Count` property (number of drives in a system).
`Folders` collection contains:	`Item` property (used to access the folder). `Count` property (number of folders in a collection). `Add` method (create new folder).
`Folder` object contains:	`SubFolders` collection (subfolders of a folder, including those with hidden and system file attributes set). `Files` collection (access all files in a folder).
`Files` collection contains:	`Item` property (used to access the file). `Count` property (number of files in a folder).
`File` object contains:	`Name` property (file name). `Size` property (file size). `DateCreated` property (file creation date and time).

The FSO has lots of collections, methods, and properties; I've just pointed out the most commonly used ones.

The Enumerator object is a simple iterator, used to cycle through the collection of objects:

Enumerator object contains:

item method (returns a reference to the current object in a collection).
atEnd method (returns true if the iterator has reached the end of the collection).
moveFirst method (iterates to the first object in a collection).
moveNext method (iterates to the next object in a collection).

JavaScript

var fc = new Enumerator(colFiles);
for (; !fc.atEnd(); fc.moveNext())
{
    var objFile = fc.item();
    ...
}

Actual code:

JavaScript

// Here goes the Scripting Runtime, FileSystemObject object:
var objFSO = new ActiveXObject("Scripting.FileSystemObject");
// Accessing the folder:
var objFolder = objFSO.GetFolder(curXMLfolder);
// Accessing the files:
var colFiles = objFolder.Files;

var xmlcount = 0, xslcount = 0;

// Cycling through the files, one by one:
var fc = new Enumerator(colFiles);
if (fc.atEnd() != true)
// If collection of files is not empty...
{
 for (; !fc.atEnd(); fc.moveNext())
 // Iterating through the files
 {
   var objFile = fc.item();
   var ftext = objFile.Name.toLowerCase();

   // Checking the extension:
   if ((ftext.substr(ftext.length-3, 3)=="xml") || 
       (ftext.substr(ftext.length-3, 3)=="rdf"))
   {
     xmlcount = xmlcount + 1;
     // Opening the <SELECT> tag is any XML files exist:
     if (xmlcount == 1) 
       xmlsel="<SELECT id='xmlselection' onchange='refresh()'>";

     // Adding an option:
     xmlsel=xmlsel+"<OPTION value="+ftext+">"+
            ftext+"</OPTION>";

     // Closing the tag:
     if (fc.atEnd()) xmlsel=xmlsel+"</SELECT>";
   }
 }
}

Loading XML from a file.

This is the MSXML's part:

JavaScript

// Creating the new empty DOM tree:
var xml = new ActiveXObject("MSXML2.DOMDOCUMENT");
// No asynchronous load:
xml.async = false;
// Loading the file from disk:
xml.load(curXMLfolder + xmlselection.value);

Doing the same for the stylesheet:

JavaScript

// Creating the new empty DOM tree:
var xsl = new ActiveXObject("MSXML2.DOMDOCUMENT");
// No asynchronous load:
xsl.async = false;
// Loading the file from disk:
xsl.load(curXSLfolder + xslselection.value);

Loading XML from a string.
Loading XML data from a string is a bit different from loading a file. No files, no options; all you must do is to write a string which will contain your XML code. Then, you parse that string with a single call to the LoadXML method:
JavaScript
```
// Defining a string - default stylesheet:
var defsheet="<?xml version=\"1.0\"?>";
...
defsheet += "</xsl:stylesheet>";

// String -> DOM:
if(!defSheetCache)
{
 defSheetCache = new ActiveXObject("MSXML2.DOMDocument");
 defSheetCache.async = false;
 defSheetCache.resolveExternals = false;
 defSheetCache.loadXML(defsheet);
}
```
Here, LoadXML is used for loading a default stylesheet (hard-coded in a string), used when no XSL files are found in the appropriate folder.

Document validation.

Step 3: review the validation result.

The actual validation takes place immediately after the XML document has finished loading:

JavaScript

...
xml.load(curXMLfolder + xmlselection.value);
// Document is already validated;
// 'xml.parseError.errorCode' contains error code, if any.
...

So, all you must do is to check:

JavaScript

if (xml.parseError.errorCode != 0)
{
   // Handle error
}
else
{
   // Proceed - XML is ok.
}

Validating an XML document against an arbitrary XSD schema.

To validate your XML document against a schema, the script does the following:

JavaScript

...

if(xslFile.substr(xslFile.length - 3, 3) == "xsd")
{
 // 1. Loading XSD schema into the DOMDocument:
 var schemaSource = new ActiveXObject("MSXML2.DOMDocument.4.0");
 if(!schemaSource.load(curXSLfolder + xslFile))
 {
   xslErrorCache = schemaSource.parseError.errorCode + 
                   ": " + schemaSource.parseError.reason;
   passedXSL.innerHTML = "... Schema is corrupt ...";
   result.innerHTML = "";
   return;
 }

 // 2. Extracting the targetNamespace
 // attribute from the schema:
 schemaSource.setProperty("SelectionLanguage", "XPath");
 schemaSource.setProperty("SelectionNamespaces", 
   "xmlns:xs='http://www.w3.org/2001/XMLSchema'");

 var tnsattr = schemaSource.selectSingleNode("/*[local-name()" + 
               "='schema']/@targetNamespace");
 var nsuri = tnsattr ? tnsattr.nodeValue : "";

 // 3. Creating/purifying the schema cache:
 if(!schemaCache)
   schemaCache = new ActiveXObject("Msxml2.XMLSchemaCache.4.0");
 else
 {
   for(var i = 0; i < schemaCache.length; i++)
   {
     schemaCache.remove(schemaCache.namespaceURI(i));
   }
 }

 // 4. Adding schema to the schema cache:
 schemaCache.add(nsuri, schemaSource);

 ...

 // 5. Binding schema cache to an empty DOMDocument:
 var xmlSource = new ActiveXObject("MSXML2.DOMDocument.4.0");
 xmlSource.schemas = schemaCache;
 xmlSource.async = false;

 // 6. Loading the document:
 if(!xmlSource.load(curXMLfolder + xmlFile))
 {
   xslErrorCache = xmlSource.parseError.reason;
   passedXSL.innerHTML = 
     "... XML document doesn't conform to schema ...";
   result.innerHTML = "";
   return;
 }
 else
 {
   result.innerHTML = xml.transformNode(xsl.documentElement);
   passedXSL.innerHTML = 
     "... XML document conforms to schema ...";
 }
}

Please take a note: this validation procedure requires you to have MSXML 4.0 installed.

Transforming the XML with an XSL stylesheet.
Once the XML and the XSL files are loaded into DOM trees, transforming the XML data with a stylesheet is as easy as nothing:
JavaScript
```
resultCache = xml.transformNode(xsl.documentElement);
```

Reconnecting CSS.

One problem arises with resultCache: if the input XSLT document generates embedded stylesheet (<STYLE>) blocks, these will be stripped from the resulting HTML after we display it through result.innerHTML. This problem can be solved by extracting the style definition from the result and incorporating it into the browser's document:

JavaScript

var elem = document.createStyleSheet();

elem.cssText =
  trim(resultCache.substring(resultCache.indexOf("<style>") + 7,
       resultCache.indexOf("</style>")));
elem.title = "user_styles";

To avoid style conflicts, we must "collect garbage" immediately before every transformation:

JavaScript

// ...somewhere at the beginning of 'refresh()':

// Clear styles generated by previous XSLT stylesheet, if any:
var stls = document.getElementsByTagName("style");
for(var k = 0; k < stls.length; k++)
{
  var stl = document.styleSheets[k];
  if (stl.title == "user_styles")
  {
    var r = stl.rules.length;
    for(var j = 0; j < r; j++)
      stl.removeRule[0];

    break;
  }
}

Saving the result of the transformation to a file.

Step 4: save the XSLT output to a file.

Two points of interest here: the "Save" dialog, and the file creation process itself. For the "Save" dialog to work, you must register and obtain a design-time license for the following ActiveX component:

XML

<object id="cmdlg"
        classid="clsid:F9043C85-F6F2-101A-A3C9-08002B2F49FB"
        codebase="http://activex.microsoft.com/controls/vb6/comdlg32.cab">
</object>

Then, you can use it:

JavaScript

function fileSave()
{
  cmdlg.CancelError = false;
  cmdlg.FilterIndex = 1;
  cmdlg.DialogTitle = "Save file as";
  cmdlg.Filter = "HTML file (*.html)|*.html|XML file (*.xml)|*.xml";

  // Calling the dialog:
  cmdlg.ShowSave();

  return cmdlg.FileName;
}

Wishing to save XSLT output, we simply take resultCache and stream it down to a file. No need to check for any errors here, because we don't show the "Save..." button if either of the two documents (XML or XSL) hasn't passed validation.

JavaScript

function Save()
{
  // Asking for file name:
  var filename = fileSave();

  if (filename != "")
  {
    // Creating the file:
    var objFSO = new ActiveXObject("Scripting.FileSystemObject");
    var objFile = objFSO.CreateTextFile(filename);

    // Writing the XSLT output:
    objFile.Write(resultCache);
    objFile.Close();
  }
}

All inclusive

As a test bed for the presented utility, the accompanying Zip file contains the NASDAQ historical price data of Sun Microsystems Inc., along with three XSLT stylesheets I've written:

Plain. This one shows the original XML data, with IE's color scheme:
Table. This is a simple example of an XSL transformation. Green/red color rows, showing increase/decrease of stock price, is an illustration of the <xsl:choose> rule:
Bar graph. A more complex stylesheet. This one features: "for-next"-style cycles (implemented using recursive calls of named templates); searching for the maximum in a row of values (with the use of the <xsl:sort> rule); and, of course, an algorithm for building stylish bar charts:

Make it quick

The last little things included are:

transfrm script - a simple WSH tool for applying XSLT stylesheets to XML documents. The script works in two modes:
- Batch mode is ideal for performing numerous XSL transformations in one go. The script takes a single file name as an argument:
```
transfrm.js batch.list
```
  The file, passed as an argument, may contain an arbitrary number of lines (one transformation per line) in the following format:
```
<xml_file_name>,<xsl_file_name>,<result_file_name>
```
  Sample batch list:
```
stock.xml,plain.xsl,result1.html
stock.xml,table.xsl,result2.html
stock.xml,bargraph.xsl,result3.html
```
  Aside from the resulting files, the script creates, or appends to, the "transfrm.log" file, which contains a transformation log.
- In the single transformation mode, the script accepts three arguments:
```
transfrm.js input.xml input.xsl output_file.html
```
  The "transfrm.log" file will be populated with the information on the last transformation.
validate.js script - a WSH tool for validating a single XML document against an arbitrary XSD schema.
- Single validation mode:
```
validate.js input.xml input.xsd
```
  As a result, you'll have a message box, saying if the input.xml conforms to input.xsd or not.
- Batch mode:
```
validate.js batch.file
```
  The batch file contains the list of input XML files and XSD schemas:
```
<xml_file_1_name>,<xsd_file_1_name>
<xml_file_2_name>,<xsd_file_2_name>
```
  The script creates, or appends to, the "validate.log" file with the details of the last validation.

Security

Looking at the previous sections, one can guess: a script that writes arbitrary data to arbitrary files can be a big source of headache and security problems. Moreover, HTML applications are not subject to IE's security restrictions (see the appropriate introduction), so third-party (or just erroneous) scripts that use the FileSystemObject can be a major security threat.

This dictates two primary uses of the Scripting Runtime: "local" (non-web) utilities, and server-side scripting. As MSDN says, "because the use of the FSO on the client side raises serious security issues about providing potentially unwelcome access to a client's local file system, this documentation assumes the use of the FSO object model to create scripts executed by Internet Web pages on the server side. Since the server side is used, the Internet Explorer default security settings do not allow the client-side use of the FileSystemObject object. Overriding those defaults could subject a local computer to unwelcome access to the file system, which could result in total destruction of the file system's integrity, causing loss of data, or worse."

Sometimes, the only option you can look at is turning stand-alone HTAs to corporate web pages (simply renaming the .hta file to .html and ripping the HTA:APPLICATION tag), thus using the FSO at client-side. It raises problems with component licensing and execution permissions; furthermore, you must be sure that your intranet is extremely secure. In this case (i.e., if you're returning to "ordinary web"), in order to defend yourself from any unexpected behavior, and, on the other side, use the power of advanced scriptable objects (like Scripting.FileSystemObject or MSXML.DOMDocument), please consider the following:

Never allow non-secure and unsigned ActiveX components to run without your explicit approval. Set the "Initialize and script ActiveX controls not marked as safe" option in IE's Security tab to "prompt".
Never allow Java components to be downloaded and run without your explicit approval. Set the Java Virtual Machine security level in IE's Security tab to "Medium" or "High".
Scripts downloaded from home or corporate intranets are usually trustworthy; so you may wish to set the security level of "Local Intranet" to "Medium" or even "Low", while setting "Internet" security level to "High".
Add servers where your scripts reside in, to the "Trusted sites" zone.

Please note, that these are just basic rules; you should probably consult with IT professionals to build up your intranet security to the appropriate level.

Alternatives

Other downsides of the "Xbrowser" are: it is strictly IE-bound, and it depends too much on external code libraries. This is the flipside of the power that the IE engine provides; however, dependencies can be reduced in a number of ways.

XML/XSL/XPath processing:
XML for <SCRIPT> - cross Platform standards-compliant XML parser in JavaScript. Pros: W3C DOM (level 2)/SAX parsers included, together with an XPath processor. Cons: if you need a schema/DTD-aware parser, this is not your choice. (Almost) Perfect for cross-browser work.

Sarissa - not a parser, but a JavaScript wrapper for native XML APIs. DOM operations, XSL transformations, and XPath queries can be performed; all popular browsers are supported. This can (also) help you build cross-browser XML solutions.
I/O:
Unfortunately (or fortunately?), there is no standard alternative to the Scripting Runtime Object Library for file I/O. Other browsers (Mozilla Firefox, Opera, etc.) don't tolerate any deviations from the ECMAScript standard (Microsoft's JScript is the implementation of it), so you can be sure that no code is capable of tearing your file system apart.

History

November 10^th, 2005 - initial release.
March 15^th, 2006 - serious CSS issue fixed, words on common controls licensing added.
April 25^th, 2006 - validation of an XML document against an XSD schema is now available; minor improvements and bug-fixes.
May 15^th, 2006 - minor optimizations and bug-fixes.
May 25^th, 2006 - transformation and validation tools rewritten.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Browsing XML/XSL/XSD with HTA/Scripting Runtime

What is it about

Requirements

How stuff works

Actual code:

All inclusive

Make it quick

Security

Alternatives

Links

Common Dialog Controls

Scripting

Script Security

XML

Tools

Officials speak

Articles & code samples

History

License