Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / PowerShell

PowerShell and XML

4.75/5 (10 votes)
27 Feb 2010CPOL6 min read 2  
Introduction to how easy PowerShell makes working with XML programmatically

Introduction

XML is everywhere. As a result, many of us find the need to work with XML and traditional text editors don't cut it. Some editors provide decent formatting (like Notepad++), but they don't provide a mechanism to examine the underlying data in the XML programmatically.

PowerShell makes handling XML very simple. It converts XML elements to properties on .NET objects without the need to write any parsing code. So all you need is PowerShell installed on your machine and you're ready to go!

A Quick Example

Here's a quick example to show how PowerShell maps the XML elements and attributes to object properties. It assumes the file.xml file exists and contains the text below:

XML
<!-- file.xml -->
<employees>
	<employee id="101">
		<name>Frankie Johnny</name>
		<age>36</age>
	</employee>
	<employee id="102">
		<name>Elvis Presley</name>
		<age>79</age>
	</employee>
	<employee id="301">
		<name>Ella Fitzgerald</name>
		<age>102</age>
	</employee>
</employees>

Examples of loading file.xml into an XmlDocument object and getting at the nodes/properties.

PS C:\> $xml = [xml](get-content file.xml)
PS C:\> $xml

#comment                                                    employees
--------                                                    ---------
 file.xml                                                   employees

PS C:\> $xml.employees

employee
--------
{Frankie Johnny, Elvis Presley, Ella Fitzgerald}

PS C:\> $xml.employees.employee

id                                      name                                    age
--                                      ----                                    ---
101                                     Frankie Johnny                          36
102                                     Elvis Presley                           79
301                                     Ella Fitzgerald                         102

PS C:\> $xml.employees.employee[0].name
Frankie Johnny
PS C:\> $xml.employees.employee[1].age
79
PS C:\>

The cmdlet get-content is equivalent to cat in UNIX and returns the contents of the file as lines of text.

The square brackets '[xml]' in front of the get-content cmdlet indicate an type object. In this case, it is casting the text returned from get-content file.xml to an XmlDocument object. Once you have an XmlDocument object, PowerShell's builtin support for XML kicks in. The individual XMLElement objects present their children nodes as properties. This means the element name is the property. In this case, the root element is <employees> so it is accessed as a property on the $xml variable. And try this: type '$xml.em' and then hit TAB. That's right, tab completion of the element names. Makes it a little easier for ya.

But Wait, There's More

You still have access to the methods of the underlying XmlElement object too. Execute the command '$xml | gm' to get a list of all the methods and properties available for any XML node. Which means you can use SelectNodes() and SelectSingleNode() using the XPath query syntax. Here's an example using the SelectNodes() method on XmlElement.

PS C:\> $xml = (Get-Content file.xml)
PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.SelectNodes("/employees/employee")

id                                      name                                    age
--                                      ----                                    ---
101                                     Frankie Johnny                          36
102                                     Elvis Presley                           79
301                                     Ella Fitzgerald                         102

And by passing those results through the PowerShell pipeline, you can pipe those results into other commands like select-object, where-object and foreach-object to prune the values or execute commands based on the values for a very expressive and powerful scripting experience. Note: I'm using the alias for get-content (gc). Most PowerShell cmdlets have aliases that are an acronym from their fullname.

In the 4 examples below, we use where-object and foreach-object to check properties of the employee nodes so we only return those that match a certain criteria or manipulate them in some way. The first example looks like it should return the 2 employees with age over 50 but it doesn't. The reason is that the properties of the XML objects are always strings. Thus it does a lexicographical comparison. That is easily solved by casting the value to an int using the type cast operator [int] much in the same way we cast the string output of get-content to an xml document [xml].

The 2nd command shows that casting to an int properly returns the employees older than 50. The 3rd example shows that you can call the methods on the string objects returned. So let's say the first digit of your employee id indicates a department, you can easily use the startsWith() method on the String object. What is $_? The '$_' is a variable that represents the current pipeline object. It is used to so that you can manipulate the current object being passed to your function/script block. The where-object cmdlet takes a script block (denoted by the curly brackets { }) and executes the powershell script inside. If the expression evaluates to true, the object is passed down the pipeline to the next command (which in this case is the default output command) and when the where-object script block evaluates to false, the object is "dropped".

If you wanted to manipulate the XML nodes values somehow you would use foreach-object which also takes a script block. And in fact the 4th command uses foreach to create some concatenated string of all the employee nodes values.

PS C:\> $xml = [xml](gc file.xml)
PS C:\> $xml.employees.employee | where { $_.age -gt 50 }

id                                      name                                    age
--                                      ----                                    ---
102                                     Elvis Presley                           79

PS C:\> $xml.employees.employee | where { [int]$_.age -gt 50 }

id                                      name                                    age
--                                      ----                                    ---
102                                     Elvis Presley                           79
301                                     Ella Fitzgerald                         102

PS C:\> $xml.employees.employee | where { $_.id.startsWith("1") }

id                                      name                                    age
--                                      ----                                    ---
101                                     Frankie Johnny                          36
102                                     Elvis Presley                           79

PS C:\> $xml.employees.employee | foreach { $_.id + ":" + $_.name + ":" + $_.age }
101:Frankie Johnny:36
102:Elvis Presley:79
301:Ella Fitzgerald:102

A Really Simple Silly Example

RSS feeds are easy to access and they are basically just XML. So let's see how PowerShell deals with RSS feeds. There are already articles on the InterTube that showcase how easy it is to do this, but I thought I'd include it as well. Grab an RSS feed from a favorite website and see if there are any articles that match some criteria. I'm going to grab TMZ's RSS feed and check for articles with my first name "scott". First I want to show how to grab the XML RSS document and display the articles using 1 or more of their properties. This is done by using the .NET class System.Net.WebClient's DownloadString(String url) method. Very easy to do.

Next I'm going to select articles with the name 'scott' in the title. I will use the -imatch Regex Powershell operator. It does a case insensitive search for the target string anywhere in the input string. Note: Your results may differ as the articles differ by day from TMZ website. Also I'm using 'ft' which is an alias for 'format-table' which lets me select which properties I want to display.

PS C:\powershell> $url = "http://www.tmz.com/rss.xml"
PS C:\powershell> $feed=[xml](new-object system.net.webclient).downloadstring($url)
PS C:\powershell> $feed.rss.channel.item | format-table title,link

title                                                       link
-----                                                       ----
Report: Marie Osmond's Son Commits Suicide                  http://www.tmz.com/2010/02/27/marie-osmond-son-commits-s...
Scotty Lago Conspiracy Theory -- Up in Smoke                http://www.tmz.com/2010/02/27/scotty-lago-michael-phelps...
Brittany Murphy -- 109 Mystery Pills                        http://www.tmz.com/2010/02/27/brittany-murphy-prescripti...
Liev Schreiber to the Rescue!                               http://www.tmz.com/2010/02/27/liev-schreiber-broadway-au...
Scotty Lago's Olympic Conspiracy Theory                     http://www.tmz.com/2010/02/27/scotty-lago-olympics-vanco...
Britney Spears -- The Blonde Is Back                        http://www.tmz.com/2010/02/27/britney-spears-blonde-hair...
Former 'Idol' Elliott Yamin in Chile During Quake           http://www.tmz.com/2010/02/27/american-idol-elliott-yami...
'Pants on the Ground' Guy -- King of Vegas                  http://www.tmz.com/2010/02/27/pants-on-the-ground-guy-la...
Reality to Nas -- 'Memba Me?                                http://www.tmz.com/2010/02/27/nas-federal-tax-bill-lien-...
Nic Cage's Manager -- Get Me Outta This Suit!               http://www.tmz.com/2010/02/27/nic-cage-manager-sam-levin...
Audrina Patridge:  Bad Acting Got Me Towed                  http://www.tmz.com/2010/02/27/audrina-patridge-tow-car-t...
TMZ's Bangin' Backside Contest -- Bootyfull!                http://www.tmz.com/2010/02/27/tmzs-bangin-backside-conte...
Conan the Barbarian -- One Hairy Situation                  http://www.tmz.com/2010/02/27/conan-the-barbarian-in-ano...
Carol Brady vs. Mrs. C: Who'd You Rather?                   http://www.tmz.com/2010/02/27/carol-brady-vs-mrs-c-whod-...
Nicole Richie & the Chocolate Factory                       http://www.tmz.com/2010/02/27/nicole-richie-and-the-choc...
Guess Who This Guy Turned Into!                             http://www.tmz.com/2010/02/27/guess-who-this-guy-turned-...
Jon Cryer Alleged Hit - Mexico Gang Connection?             http://www.tmz.com/2010/02/27/jon-cryer-alleged-hit-mexi...
Avril & Deryck's Divorce Takes a Backseat                   http://www.tmz.com/2010/02/27/avril-lavigne-and-derycks-...
World to Joanna Krupa's Mom -- Thank You!                   http://www.tmz.com/2010/02/27/joanna-krupa-dancing-with-...
K-Fed Confused by Green Mystery Substance                   http://www.tmz.com/2010/02/27/k-fed-kevin-federline-shop...

PS C:\powershell> $feed.rss.channel.item | where { $_.title -imatch "scott" } | ft title,link

title                                                       link
-----                                                       ----
Scotty Lago Conspiracy Theory -- Up in Smoke                http://www.tmz.com/2010/02/27/scotty-lago-michael-phelps...
Scotty Lago's Olympic Conspiracy Theory                     http://www.tmz.com/2010/02/27/scotty-lago-olympics-vanco...

Small Print and Gotchas

The "item" Property Issue

Powershell automatically adds an "Item" property to each XmlElement. This makes hashtable like access to its properties possible '$feed["rss"]'. Since RSS docs have 'Item' nodes, Powershell complains when it tries to access them. So you must work around that by either changing the name(annoying I know) or writing your code to work past the "item" node as in the examples above. Below is an example of the error you see when trying to access the underlying "Item" nodes.

PS C:\powershell> $feed.rss.channel
format-default : The member "Item" is already present.
    + CategoryInfo          : NotSpecified: (:) [format-default], ExtendedTypeSystemException
    + FullyQualifiedErrorId : 
    AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.FormatDefa
   ultCommand

Hidden Methods

Some of the underlying methods for XmlDocument and XmlElement are not exposed using get-member. But you can easily get them from the .NET API docs. and any properties defined for the class need to be accessed using their method form (get_Property()). The recommendation is to know the methods on the underlying XmlElement instances or have the API docs handy.

Updating XML

There's no native support for setting values of XmlElements in PowerShell. What you have to do is call the methods of the underlying XmlElement/XmlDocument objects as you would in C#. But remember they don't appear when you do '$xml | gm'. Understanding the methods on the XML .NET classes (XmlElement, XmlDocument) helps here.

Other Resources

Articles describing the "item" property issue better and also some examples of how to modify an XML document in PowerShell and save it back:

Next Steps

At my current job, I'm working with Lucene to build a search index for our music catalog which includes albums, tracks and artists. Lucene is an open-source library for Information Retrieval. It's widely used and has both a Java version and .NET port. We use Java here at work but I've been writing some scripts around Lucene.NET in PowerShell. The Lucene.NET port is several versions behind the Java version unfortunately but still very useful (latest is 2.4). In my next article, I plan on combining the robust XML handling of PowerShell to seamlessly integrate with Lucene.NET so that it's extremely easy to grab XML docs/RSS feeds from the web and index them for fast retrieval.

Conclusion

Hopefully this article has shown how easy it is to work with XML using Windows PowerShell. In addition to the robust XML handling capabilities, I hope to have impressed how expressive one can be joining the pipeline architecture with XML to create very flexible and useful scripts/tools in PowerShell.

This is my first posting to CodeProject. I hope you find it useful and I welcome any and all feedback. Thanks!

History

  • 27th February, 2010: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)