Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XSLT

Using XPath to Navigate the File System

4.80/5 (11 votes)
11 Aug 2012CPOL5 min read 32K   399  
How to implement XPathNavigator for file system

Introduction

This article has two goals:

  1. It shows how to make your own XPathNavigator implementation and use it to evaluate XPath expressions and apply XSLT transformation to the structures, not intended to be used this way.
  2. It presents a new alternative way to work with files and folders that some people may consider useful.

What is XPathNavigator

XPathNavigator is the system abstract class that implements an XPath document model and provides a means of navigating through XML nodes and evaluating XPath expressions. Unlike XmlNode or XNode, XPathNavigator is a cursor that may point to any node in the XML tree and be moved to another node. XPathNavigator is also used as an input for XslCompiledTransform, and therefore any implementation of the XPathNavigator can be transformed with an XSLT stylesheet.

XPathNavigator implementations exist for all XML models in .NET, including XmlDocument and LINQ to XML. Generally an instance of an XPathNavigator inheritor can be created for classes that implement an IXPathNavigable interface. This interface contains a single method CreateNavigator. The classes XmlNode and XPathDocument (a special fast model that only provides read-only access via the XPathNavigator model) implement IXPathNavigable. However this is not always the case. The newest library for working with XML - LINQ to XML - allows the creation of an XPathNavigator with extension methods.

It's worth mentioning that none of the system-integrated implementations of the XPathNavigator are public.

How to implement XPathNavigator

XPathNavigator contains 116 public members, 112 of which can be overridden. The good news is that only 20 of them are abstract, i.e., must be implemented.

Here they are: 

Properties

C#
XmlNameTable NameTable
XPathNodeType NodeType
string LocalName
string Name
string NamespaceURI
string Prefix
string BaseURI
bool IsEmptyElement

Methods

C#
XPathNavigator Clone()
bool MoveToFirstAttribute()
bool MoveToNextAttribute()
bool MoveToFirstNamespace(XPathNamespaceScope namespaceScope)
bool MoveToNextNamespace(XPathNamespaceScope namespaceScope)
bool MoveToNext()
bool MoveToPrevious()
bool MoveToFirstChild()
bool MoveToParent()
bool MoveTo(XPathNavigator other)
bool MoveToId(string id)
bool IsSamePosition(XPathNavigator other)

Implementing these 20 members is enough for complete a working read-only XPathNavigator implementation. If you want to support modification of the model, you need to override some more, at least those that throw the NotSupportedException.

C#
void SetValue(string value)
XmlWriter PrependChild()
XmlWriter AppendChild()
XmlWriter InsertAfter()
XmlWriter InsertBefore()
XmlWriter CreateAttributes()
XmlWriter ReplaceRange(XPathNavigator lastSiblingToReplace)
void DeleteRange(XPathNavigator lastSiblingToDelete)

As you can see, in order to implement a writable XPathNavigator you must unfortunately implement XmlWriter too.

In this article we will only need read-only functionality. 

File system as XML

XPath model can be actually applied not only to XML, but to any tree-like structure as the file system consists of files and directories. In our model we will treat files and directories as XML elements. In this simple implementation I used file names as names for XML elements. This is convenient because with such a solution XPath expressions become very similar to file paths or local URIs. I didn't use any namespace or prefix for file nodes for the same reason: should they have prefixes, the XPath would become more cumbersome.

I rendered file properties, such as full name, extension, and file attributes as XML attributes. For the sake of simplicity, there won't be any text node in this implementation and no part of a file's content will be included in the tree.

Implementing XPathNavigator for file system

As I mentioned above XPathNavigator is a cursor that can be moved around the tree and point to any part of the tree, including elements, which are files and folders for our task, and attributes. For this reason we will delegate all calls in the XPathNavigator to the internal instance of a subsidiary class. There will be one class per type of node.

Since we decided not to deal with namespaces we will not need to implement some members. So the following getters will just return String.Empty:

C#
public override string NamespaceURI { get { return String.Empty; } }
public override string Prefix  { get { return String.Empty; } }
public override string BaseURI  { get { return String.Empty; } }

The following methods will return false.

C#
public override bool MoveToFirstNamespace(XPathNamespaceScope namespaceScope) { return false; }
public override bool MoveToNextNamespace(XPathNamespaceScope namespaceScope) { return false; }

The property Name will return the same value as LocalName:

C#
public override string Name { return LocalName; }

We will not define the IDs for our nodes, so MoveToId will also return false:

C#
public override bool MoveToId(string id) { return false; }

Some of you might feel an uneasiness looking at the property NameTable as it returns an instance of the abstract class without any public implementation XmlNameTable. This property is called during the XSL transformation and is intended to increase the performance by means of comparing strings by instance rather than by value. Hopefully you can safely return null from it.

C#
public override XmlNameTable NameTable { get { return null; } }

Wow! We have already implemented 8 out of 20 mandatory members! The remaining 12 will be delegated to the aforementioned internal instance. You can consider it the navigator's State object. For methods that start with Move* and return boolean, we will change the signature: methods in the state will return the new instance of state linked to the node at which the navigator will point after moving. If the navigator can't be moved to the specified position, we will return null and the corresponding method in the XPathNavigator in this case will return false.

See the definition of the abstract class representing the internal state below:

C#
/// <summary>
/// Abstract class that defines behaviour of the XPath navigator positioned on a specific item
/// </summary>
internal abstract class XPathItem
{
    public abstract string Name { get; }
    public abstract XPathItem MoveToFirstAttribute();
    public abstract XPathItem MoveToNextAttribute();
    public abstract XPathItem MoveToFirstChild();
    public abstract XPathItem MoveToNext();
    public abstract XPathItem MoveToPrevious();
    public abstract XPathNodeType NodeType { get; }
    public virtual string Value { get { return string.Empty; } }
    public abstract bool IsEmptyElement { get; }
    public abstract XPathItem MoveToParent();
    public abstract XPathItem MoveToAttribute(string name);
    public abstract bool IsSamePosition(XPathItem item);
}

Now we can implement the rest of our XPathNavigator.

C#
/// <summary>
/// An <see cref="XPathNavigator"/> implementation,
/// that allows navigation through local file system as if it were XML, 
/// with files and folders representing XML elements
/// </summary>
public class FileSystemXPathNavigator : XPathNavigator
{
    private XPathItem _item;

    /// <summary>
    /// Constructs a new instance of the file system XPath navigator rooted at the specified file or folder
    /// </summary>
    /// <param name="fileName">A path to the local file system item, 
    /// which will be the root element in the traversed XPath document model</param>
    public FileSystemXPathNavigator(string fileName)
    {
        _item = RootItem.CreateItem(fileName);
    }

    private FileSystemXPathNavigator(XPathItem item)
    {
        _item = item;
    }

    public override XPathNavigator Clone()
    {
        return new FileSystemXPathNavigator(_item);
    }

    public override bool IsEmptyElement
    {
        get { return _item.IsEmptyElement ; }
    }

    public override bool IsSamePosition(XPathNavigator other)
    {
        var o = other as FileSystemXPathNavigator;
        return o != null && o._item.IsSamePosition(_item);
    }

    public override string LocalName
    {
        get { return _item.Name; }
    }

    public override bool MoveTo(XPathNavigator other)
    {
        var o = other as FileSystemXPathNavigator;
        if (o != null)
        {
            _item = o._item;
            return true;
        }
        return false;
    }

    public override bool MoveToFirstAttribute()
    {
        return MoveToItem(_item.MoveToFirstAttribute());
    }

    private bool MoveToItem(XPathItem newItem)
    {
        if (newItem == null) return false;
        _item = newItem;
        return true;
    }

    public override bool MoveToFirstChild()
    {
        return MoveToItem(_item.MoveToFirstChild());
    }

    public override bool MoveToNext()
    {
        return MoveToItem(_item.MoveToNext());
    }

    public override bool MoveToNextAttribute()
    {
        return MoveToItem(_item.MoveToNextAttribute());
    }

    public override bool MoveToParent()
    {
        return MoveToItem(_item.MoveToParent());
    }

    public override bool MoveToPrevious()
    {
        return MoveToItem(_item.MoveToPrevious());
    }

    public override XPathNodeType NodeType
    {
        get { return _item.NodeType; }
    }

    public override string Value
    {
        get { return _item.Value; }
    }

    public override bool MoveToAttribute(string localName, string namespaceURI)
    {
        if (namespaceURI != String.Empty) return false;
        return MoveToItem(_item.MoveToAttribute(localName));
    }

    //more members
}

Implementation of Working Objects

The following class hierarchy carries out the navigation through the file system and represents it as XML. 

class diagram

I don't want to go deep into the details of the implementation of each class. Just to cover the most interesting items:

  • The System.IO.FileSystemInfo class is used as a source of file system items attributes
  • C#
    protected internal abstract FileSystemInfo FileSystemInfo { get ; }
  • As file names allow much more special symbols than XML elements do, encoding is required:
  • C#
    public override string Name
    {
        get { return XmlConvert.EncodeLocalName(FileSystemInfo.Name); }
    }
  • The full path of files determines their equivalence:
  • C#
    public virtual string Path
    {
        get { return FileSystemInfo.FullName; }
    }
    public override bool IsSamePosition( XPathItem _item)
    {
        var fsi = _item as FileSystemItem;
        return fsi != null && Path == fsi.Path;
    }
  • Parent node, which in most cases correspond to a directory, is responsible for getting the next and previous nodes. For this purpose the list of files in this directory and the current index are stored in each DirectoryItem instance.
  • Attribute items are flexible and can be positioned on any attribute of a file or a directory. The AttributeInfo class is a flyweight for each kind of attribute (by and large they replicate properties of the FileSystemInfo class).
  • Flag attributes, such as Hidden, System etc., are considered false by default.

Console Utility

I also create a console utility xfiles, that uses FileSystemXPathNavigator to evaluate XPath expressions and make XSLT transformations on the file system. If executed with /? argument, it shows the following information:

Evaluates XPath expression and performs XSLT trasformation on the local file system
Copyright: Boris Dongarov, 2012

Usage:
xfiles.exe [path] -xpath:<xpath-expression>
  or
xfiles.exe [path] -xsl:<xslt-file>
where
path - a path to a file or a folder that will be a root of the XPath model (the current directory by default)
-xpath - an XPath expression to be applied to the file system
-xsl - a file name of the XSLT stylesheet to be applied to the file system
If neither option is specified, the XML will be rendered to the standard output

Some examples:

xfiles.exe

Shows the content of the current directory as XML.

xfiles.exe "C:\Program Files" -xpath:sum(//*[@Extension='.exe']/@Length)

Shows the total size of all executable files in the Program Files folder.

xfiles.exe "C:\Program Files" -xpath://*[@Temporary]/@FullName > tempfiles.txt

Writes the list of temporary files in the Program Files folder to the specified file.

xfiles C:\WINDOWS\SYSTEM32 -xsl:dirstructure.xslt > system32.htm

Transforms the content of the system directory to HTML using custom XSLT stylesheet.

Conclusion

I hope this article showed you the power of the XPathNavigator and will help you to implement your own. Also you may find useful the utility xfiles and the approach to working with files and directories that it provides. You can find all source codes attached to this article.   

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)