Contents
XPS (XML Paper Specification) is a fixed page format specification that is a useful alternative to PDF. Just as PDF is a 'cut-down' version of PostScript, XPS is a reduced schema version of XAML specifically for fixed page layout. With XPS being XML based, it should be a great format for generating your own documents. Unfortunately, there seems to be little available describing this format in a way that's useful for actual implementation when you want to do just that. I'm hoping to help fill in some of those gaps with a (short) series of articles.
My introduction to XPS began with mocking up documents using Word and then "printing" them using the XPS Printer driver provided by .NET v3, afterwards examining the XPS documents to learn how they are structured and how to manipulate them. Apparently, if you have MS Office 2007 and get an optional update, you can also do a "Save As" to produce an XPS document.
I found that those XPS files produced by Word and the "XPS Printer" often included a large number of unnecessary artifacts (especially if it's a file you've edited several times, changed the fonts, etc.). This particular tool cleans out a large number of those artifacts, eliminates some duplicates, and does a few other tweaks that help to reduce the overall size of the XPS file, although in most cases, only by a few KB. Stepping through what it does also serves as a useful introduction to XPS files.
If you're planning on doing your own XPS output, then mocking up your intended format and using this tool to clean up the result is a really handy way to start.
Originally, I had pursued XPS purely as a proof of concept for a billing system. However, when it became apparent that commercial systems for PDF / PostScript production were going to be in the "insanely expensive" price bracket, my "proof of concept" became the actual production system.
This particular part of the project came out of the necessity of cleaning up marketing materials ready for their inclusion into the customer's bill. This CodeProject article is derived from that work.
The OOXML organisation used by XPS files includes a large number of cross references between different parts (files) and within the individual files. I won't go into whether OOXML is a good or a bad thing, there's already enough argument about that. However, just to add to the confusion, the OOXML "spec" has been slightly tweaked for XPS.
In the case of XPS files, the internal structure can be thought of as having three tiers (please note that this is not the official explanation, but it works for me). At the root, there's the XPS file itself. Next, there are the individual documents carried within that. Finally, there are the individual pages for each document. At each of these tiers may be held references to other parts and also resources of various types. All of this is a gross over-simplification, of course, but you get the idea.
Many of the parts (files) within the OOXML structure can be given different names, rather than the ones used by the "XPS Printer" or that are shown in the sample files, so long as all of the cross references line up.
Within each file, the various parts generally don't have to be in any specific order. It's only specific issues with regard to the layout of pages where order may become important. Otherwise, it's just do whatever is most convenient for processing.
After "printing" a sample XPS file yourself (and renaming it to a .zip file), you would probably see a structure similar to the following:
At each tier within the XPS document, you can find three different folders, although they don't have to be present at each tier:
Folder Name |
Description |
_rels |
Contains files that describe the relationships the files at this tier have with other parts within the XPS file. |
Metadata |
Holds metadata files related to this tier. For instance, thumbprint images of the document or the PrintTicket files. |
Resources |
Contains the resources (e.g., fonts and images) used by this tier of the XPS file. |
Root Tier (XPS File)
At the root tier, there will be two files:
[Content_Types].xml |
Enumerates the different types of files, specifically the file extensions contained within this XPS document. |
FixedDocumentSequence.fdseq |
Will list out the actual documents contained within the XPS file, in effect pointing to the next tier in the hierarchy. |
[Content_Types].xml would normally look something like this:
<types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<default contenttype="application/vnd.openxmlformats-package.relationships+xml"
extension="rels" />
<default contenttype="application/vnd.ms-package.xps-fixeddocumentsequence+xml"
extension="fdseq" />
<default contenttype="application/vnd.ms-package.xps-fixeddocument+xml"
extension="fdoc" />
<default contenttype="application/vnd.ms-printing.printticket+xml"
extension="xml" />
<default contenttype="image/jpeg" extension="JPG" />
<default contenttype="application/vnd.ms-package.xps-fixedpage+xml"
extension="fpage" />
<default contenttype="application/vnd.ms-package.obfuscated-opentype"
extension="odttf" />
</types>
Note the schema namespace declaration in the Root Types element, and the "rels" extension declaration, these are specific to OOXML. Next, there's the "fdseq", "fdoc", and "fpage" extensions which all declare parts of the XPS structure. Then, the "odttf" for obfuscated open type font files; more on these in another article. Unfortunately, "xml" is used as the extension for the metadata PrintTicket files. And then finally, "JPG" and "PNG" for the image files; you may also see others depending on what's sitting in your original source document. You can assume "JPG" is always going to be present because the metadata thumbnail image that's generated by the XPS printer driver is always a small JPEG image.
FixedDocumentSequence.fdseq is normally very simple. Not just its name, but also its extension tells us that it is a fixed document sequence file. For an XPS printer driver generated document, it should always look like this:
<fixeddocumentsequence xmlns="http://schemas.microsoft.com/xps/2005/06">
<documentreference source="/Documents/1/FixedDocument.fdoc" />
</fixeddocumentsequence>
Now we now where to go to find the first part of our document. However, we should have a look within the _rels folder first.
The first file is called .rels; in effect, this is the relationships file that corresponds to the [Content_Type].xml file. It would normally look like this:
="1.0" ="utf-8"
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Target="/FixedDocumentSequence.fdseq" Id="R0"
Type="http://schemas.microsoft.com/xps/2005/06/fixedrepresentation"/>
<Relationship Target="/Documents/1/Metadata/Page1_Thumbnail.JPG" Id="R1"
Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail"/>
</Relationships>
You can see that it identifies the FixedDocumentSequence.fdseq file in the root tier and assigns it an arbitrary ID of R0
. It also identifies the metadata thumbnail image which will be the thumbnail image for the entire XPS file itself.
Also, in the _rels folder is FixedDocumentSequence.fdseq.rels - it should be fairly obvious what this is the relationships file for:
="1.0" ="utf-8"
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Target="/Metadata/Job_PT.xml" Id="R0"
Type="http://schemas.microsoft.com/xps/2005/06/printticket"/>
</Relationships>
Here, the only relationship described is to the metadata PrintTicket file. PrintTickets will also be described in another article. This file will often be the only file in the root tier Metadata folder.
Document Tier
Also in the root there will be the Documents folder. This folder will contain the actual document within the XPS file. When using the .NET v3 XPS Printer Driver, this document (in its own subfolder) is always named "1", although the document can actually have any name. Normally, resources such as fonts and images used within the document will be contained at this tier under Resources.
Under the "1" folder will be FixedDocument.fdoc referred to in FixedDocumentSequence.fdseq above. This file lists out the pages in the order they are to be displayed or printed.
Page Tier
Finally, each document subfolder will contain a Pages subfolder, and each Pages subfolder has the individual page files. There will also be another _rels folder at this level containing a .rels file corresponding to each .fpage file.
If you open up each page file, you'll see quite plainly how XPS is a restricted subset of XAML with all the Path
and Glyphs
elements. Don't be surprised though to see the different parts of the page layout seemingly scattered about within the file. As long as there are no z-axis issues (i.e., one element must appear behind another), the XPS Printer Driver pumps out the various elements of the page in the order that suits it.
<FixedPage Width="816" Height="1056"
xmlns="http://schemas.microsoft.com/xps/2005/06" xml:lang="und">
<Glyphs Fill="#ff000000"
FontUri="/Documents/1/Resources/Fonts/87850AD7-9FD8-4CF2-9ED3-D635DE0AC70C.odttf"
FontRenderingEmSize="22.5173" StyleSimulations="None"
OriginX="105.6" OriginY="106.88"
Indices="44;81;87;85,42;82,52;71,55;88,55;70,45;87,34;76,27;82,51;81"
UnicodeString="Introduction" />
<Path Data="F1 M 105.6,109.28 L 228.16,109.28 228.16,111.52 105.6,111.52 z"
Fill="#ff000000"/>
<Glyphs Fill="#ff000000"
FontUri="/Documents/1/Resources/Fonts/87850AD7-9FD8-4CF2-9ED3-D635DE0AC70C.odttf"
FontRenderingEmSize="22.5173" StyleSimulations="None"
OriginX="228.16" OriginY="106.88"
Indices="3" UnicodeString=" " />
<Glyphs Fill="#ff000000"
FontUri="/Documents/1/Resources/Fonts/BCA29EFB-F86B-4B42-A6B7-754D68DD5A3A.odttf"
FontRenderingEmSize="15.0115" StyleSimulations="None"
OriginX="105.6" OriginY="132.96"
Indices="59,71;51;54;3,34;11,34;59,71;48,91;47,57; ... ;82,48;3"
UnicodeString="XPS (XML Paper Specification) is a ... useful alternative to " />
...
FixedPage
is the root element for all pages. There can be a lot of other elements contained within a FixedPage
element, but the XPS printer driver typically leaves us with just Path
(graphics) and Glyphs
(text) elements.
When it comes to the actual output of Glyphs
, it's the Indices
that are used in preference to the UnicodeString
. I've occasionally found that this has led to some interesting output. The Indices
attribute is a list of all the glyphs to be used. If it is present, then there must be a corresponding character in the UnicodeString
for each Indices
entry. Each entry in the list of indices comprises a glyph ID, optionally a comma, followed by an AdvanceWidth
, and finally, delimited with a semi-colon. There is actually a lot more that could be present in Indices
, but this is about the limit of what you'll see being pumped out by the XPS printer driver. If you want Justified, Centered, or Right aligned text, then the Indices
attribute is essential; take it out and you end up with simple Left aligned text with no special tricks. Although, there is a special trick to outputting Right aligned text without having to delve into the font files, which I'll cover in another article.
In the above extract, you can see some of the redundant artifacts that can be "cleaned" out. Within the Data
attribute of the Path
element, the spaces behind the "M" and "L" are not needed as is the space before the terminating "z". The Glyphs
element that has a UnicodeString
of " " is completely unnecessary, and the trailing space at the end of the UnicodeString
(and Indices
) attribute in the next Glyphs
element can also be eliminated. These may not seem like much, but a heavily edited Word document will tend to have a large number of such artifacts that end up in the corresponding XPS; get rid of these, and you can quite often get rid of some of the embedded font files as well, resulting in a massive reduction in file size.
Other redundant artefacts can be identified by comparing all of the files within the XPS file looking for duplicates and keeping a copy of those that are found. Later, the files that refer to the duplicate copies can have that reference altered to point to the original.
Speaking of the obfuscated font files, these are really extracts from the full font file of only the characters needed for your document. This can get interesting when you want to programmatically output some XPS (without using the .NET XPS methods) and find some of your characters have mysteriously disappeared.
This is a simple console application designed to be executed from your command line. Pass it the name of the XPS file you want cleaned. It will describe the steps it's going through as it progresses, and then finally, leave you with an output file with "-clean" appended to the filename.
Please read "Other Stuff" at the bottom of this article as you will need to get the ICSharpCode zip library to make this all work and I haven't put its DLL into the Zip.
It will be very trivial to convert this simple application to a service or DLL.
This code should really be thought of as an XML pipeline, and in fact much of its operation could be changed to pass the constituent documents through as streams from one step to the next rather than using the intermediate files as I have here. However, I've structured it this way so that you can comment out the code that deletes the intermediate files and then go in and have a look inside them.
Also, having cleaned out a lot of the unnecessary artifacts, the resulting parts that make up the "cleaned" version of the files tend to make more sense.
First of all, the application loads up the four XSLTs that do most of the actual work.
XslCompiledTransform cleanupXSLT = new XslCompiledTransform();
cleanupXSLT.Load("Resources\\XPSCleaner.xsl");
Console.WriteLine("Cleanup XSLT Loaded.");
XslCompiledTransform relsXSLT = new XslCompiledTransform();
relsXSLT.Load("Resources\\XPSRels.xsl");
Console.WriteLine("Resource Relationships XSLT Loaded.");
XslCompiledTransform relRefsXSLT = new XslCompiledTransform();
relRefsXSLT.Load("Resources\\XPSRelRefs.xsl");
Console.WriteLine("Resource Relationship Listing XSLT Loaded.");
XslCompiledTransform referencesXSLT = new XslCompiledTransform();
referencesXSLT.Load("Resources\\XPSReferences.xsl");
Console.WriteLine("References XSLT Loaded.");
Next, the original XPS file is opened up and each file is compared with every other file of the same type and size in an effort to identify duplicates. These duplicates will be dumped as the cleaned version of the XPS is built up, and any references to them in other files will also be altered. This code isn't that elegant, but it does the job.
foreach (ZipEntry ze1 in zf)
{
string ze1NewName = ze1.Name.Replace("Documents/1/", "Documents/2/");
if (dupFiles.ContainsKey(ze1NewName))
continue;
foreach (ZipEntry ze2 in zf)
{
using (Stream zs1 = zf.GetInputStream(ze1))
{
string ze2NewName = ze2.Name.Replace("Documents/1/",
"Documents/2/");
if (ze1NewName == ze2NewName ||
Path.GetExtension(ze1NewName) != Path.GetExtension(ze2NewName) ||
ze1.Size != ze2.Size)
continue;
bool isEqual = true;
byte[] buffer1 = new byte[4096];
byte[] buffer2 = new byte[4096];
int sourceBytes1;
int sourceBytes2;
using (Stream zs2 = zf.GetInputStream(ze2))
{
do
{
sourceBytes1 = zs1.Read(buffer1, 0, buffer1.Length);
sourceBytes2 = zs2.Read(buffer2, 0, buffer2.Length);
for (int i = 0; i < buffer1.Length; i++)
{
if (buffer1[i] != buffer2[i])
{
isEqual = false;
break;
}
}
if (sourceBytes1 != sourceBytes2)
{
isEqual = false;
}
} while (sourceBytes1 > 0 && isEqual);
}
if (isEqual)
{
dupFiles.Add(ze2NewName, ze1NewName);
}
}
}
}
Then, the actual cleaning phase begins with each file in the XPS that's not some kind of resource or metadata file processed in turn, being put into the output XPS file once it's been worked on. One common change that's applied is to 'move' all the files and references from document '1' to document '2'. Doing this sort of thing makes it a lot easier to merge one XPS file, produced by the XPS printer driver, with another later on.
The page files (.fpage) are passed through the cleanup XSLT to remove the redundant references and do some of the other tweaks; their corresponding .rels files are also regenerated from this 'cleaned' page file. This file in turn is processed to build up a list of resources and metadata actually used.
string entryFileName = CopyAndCleanFile(baseFileName, processingFileName,
cleanupXSLT, zf, ze, s);
string relsFileName = baseFileName + "Rels." +
Path.GetFileName(processingFileName);
string processingRelsFileName =
processingFileName.Replace("Documents/1/Pages/",
"Documents/2/Pages/_rels/") + ".rels"
relsXSLT.Transform(entryFileName, relsFileName);
Console.WriteLine("{0} has been generated.", processingRelsFileName);
File.Delete(entryFileName);
ReplaceReferencesToDuplicates(relsFileName, dupFiles);
AddZipEntry(processingRelsFileName, s, relsFileName);
relRefsList = IdentifyRels(relRefsList, relRefsXSLT, relsFileName);
Console.WriteLine("{0} Resources have been listed.",
processingRelsFileName);
File.Delete(relsFileName);
Below is the XSLT that does most of this cleanup work on the page file itself. The existing XPS methods in .NET 3 are focused around the simple generation of XPS output. To actually manipulate it requires switching to something like XSLT.
Just a note, these XSLTs are specifically set up to accommodate a Microsoft XSLT quirk that dates back at least to MSXML 3. Within each template, each element being created must have the correct namespace declared (unless it's being created inside another element), which will be discarded by the MS XSLT processor when it realises it doesn't need it. If you don't have a namespace declaration, the MS XSLT processor will insert an empty namespace declaration (xmlns=""
) in your element, which really tends to screw things up quite nicely.
="1.0"
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://schemas.microsoft.com/xps/2005/06"
exclude-result-prefixes="x">
<xsl:output indent="yes" method="xml"
encoding="utf-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
-->
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="x:Glyphs">
-->
<xsl:if test="string-length(normalize-space(@UnicodeString)) > 0">
<Glyphs xmlns="http://schemas.microsoft.com/xps/2005/06">
<xsl:apply-templates select="@*"/>
</Glyphs>
</xsl:if>
</xsl:template>
<xsl:template match="*">
-->
<xsl:element name="{name(.)}"
namespace="http://schemas.microsoft.com/xps/2005/06">
<xsl:apply-templates select="@*"/>
<xsl:choose>
<xsl:when test="count(*) > 0">
<xsl:apply-templates select="*"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="@Data[name(..) = 'Path']">
-->
<xsl:attribute name="Data">
<xsl:call-template name="CleanPath">
<xsl:with-param name="pathData" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<xsl:template match="@UnicodeString">
-->
<xsl:attribute name="UnicodeString">
<xsl:call-template name="CleanUnicodeString">
<xsl:with-param name="unicodeString" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<xsl:template match="@Indices">
-->
-->
<xsl:attribute name="Indices">
<xsl:call-template name="StringReverse">
<xsl:with-param name="string">
<xsl:call-template name="CleanIndices">
<xsl:with-param name="indices">
<xsl:call-template name="StringReverse">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</xsl:with-param>
</xsl:call-template>
</xsl:with-param>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<xsl:template match="@*">
-->
<xsl:attribute name="{name(.)}">
<xsl:choose>
<xsl:when test="starts-with(., '/Documents/1/Resources/Fonts/')">
-->
<xsl:value-of select="substring-after(., '/Documents/1')"/>
</xsl:when>
<xsl:when test="starts-with(., '/Documents/1')">
-->
<xsl:value-of select="concat('/Documents/2',
substring-after(., '/Documents/1'))"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
</xsl:template>
<xsl:template name="CleanPath">
-->
<xsl:param name="pathData" select="''"/>
<xsl:choose>
<xsl:when test="contains($pathData, ' ')">
<xsl:call-template name="CleanPath">
<xsl:with-param name="pathData"
select="concat(substring-before($pathData, ' '), ' ',
substring-after($pathData, ' '))"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="contains($pathData, ' M ')">
<xsl:call-template name="CleanPath">
<xsl:with-param name="pathData"
select="concat(substring-before($pathData, ' M '), ' M',
substring-after($pathData, ' M '))"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="contains($pathData, ' L ')">
<xsl:call-template name="CleanPath">
<xsl:with-param name="pathData"
select="concat(substring-before($pathData, ' L '), ' L',
substring-after($pathData, ' L '))"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="contains($pathData, ' z')">
<xsl:call-template name="CleanPath">
<xsl:with-param name="pathData"
select="concat(substring-before($pathData, ' z'), 'z',
substring-after($pathData, ' z'))"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$pathData"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="CleanUnicodeString">
-->
<xsl:param name="unicodeString" select="''"/>
<xsl:if test="substring($unicodeString, string-length($unicodeString), 1) = ' '">
<xsl:value-of select="substring($unicodeString, 1,
string-length($unicodeString) - 1)"/>
</xsl:if>
</xsl:template>
<xsl:template name="CleanIndices">
-->
<xsl:param name="indices" select="''"/>
<xsl:choose>
<xsl:when test="starts-with($indices, '3;')">
-->
<xsl:call-template name="CleanIndices">
<xsl:with-param name="indices"
select="substring-after($indices, '3;')"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="contains($indices, ',') and
not(contains(substring-before($indices, ','), ';')) and
starts-with(substring-after($indices, ','), '3;') ">
-->
<xsl:call-template name="CleanIndices">
<xsl:with-param name="indices"
select="substring-after($indices, '3;')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$indices"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="StringReverse">
-->
<xsl:param name="string"/>
<xsl:variable name="len" select="string-length($string)"/>
<xsl:choose>
<xsl:when test="$len < 2">
<xsl:value-of select="$string"/>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="StringReverse">
<xsl:with-param name="string"
select="substring($string, $len div 2 + 1, $len div 2)"/>
</xsl:call-template>
<xsl:call-template name="StringReverse">
<xsl:with-param name="string"
select="substring($string, 1, $len div 2)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
The above XSLT is primarily focussed around identifying redundant whitespace and eliminating that. What this occasionally leads to is a situation where a particular font file is no longer needed, and it's this situation where we can really reduce the size of the XPS file.
I could have added a call to the XSL documents()
function to include the list of duplicate files (formatted in XML) and use them in the processing. However, this requires making further changes to how the precompiled XSLT is generated, because it's a potential security risk, and also substantial changes to the XSLT itself for it to identify the references to the 'duplicates' and replace them with a reference to the 'original'. I opted for a simpler solution, from a coding perspective, to just do a search and replace, line by line, on the output from the above XSLT.
The next XSLT to be run regenerates the .rels file for us from the 'cleaned' fpage file, in effect throwing away the references to now redundant resources and/or metadata.
="1.0"
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://schemas.microsoft.com/xps/2005/06"
exclude-result-prefixes="x">
<xsl:output indent="yes" method="xml"
encoding="utf-8" omit-xml-declaration="yes"/>
<xsl:key name="resourceKey" match="//@*[starts-with(., '/Resources/Fonts/')
or starts-with(., '/Documents/2/Resources/Images/')
or starts-with(., '/Documents/2/Metadata/')]" use="."/>
<xsl:template match="/">
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
-->
<xsl:apply-templates
select="//@*[contains(., '/Resources/') or contains(., '/Metadata/')]
[generate-id() = generate-id(key('resourceKey', .))]"/>
-->
<Relationship Type="http://schemas.microsoft.com/xps/2005/06/printticket"
Target="/Documents/2/Metadata/Page1_PT.xml">
<xsl:attribute name="Id">
<xsl:value-of
select="concat('R', count(//@*[starts-with(., '/Resources/Fonts/')
or starts-with(., '/Documents/2/Resources/Images/')
or starts-with(., '/Documents/2/Metadata/')]))"/>
</xsl:attribute>
</Relationship>
</Relationships>
</xsl:template>
<xsl:template match="@*">
-->
<Relationship Type="http://schemas.microsoft.com/xps/2005/06/required-resource"
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<xsl:attribute name="Target">
<xsl:value-of select="."/>
</xsl:attribute>
<xsl:attribute name="Id">
<xsl:value-of select="concat('R', position())"/>
</xsl:attribute>
</Relationship>
</xsl:template>
</xsl:stylesheet>
Well, it wouldn't be a real project involving XSLT unless the Muenchian method made an appearance now, would it? This XSLT ensures that we have only one Relationship
element for each unique resource.
Another XSLT works with all the other files that need their references to various resources adjusted because we're moving everything from "1" to "2".
="1.0"
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://schemas.microsoft.com/xps/2005/06"
xmlns:r="http://schemas.openxmlformats.org/package/2006/relationships"
exclude-result-prefixes="x r">
<xsl:output indent="yes" method="xml"
encoding="utf-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="r:Relationships">
-->
-->
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*[not(contains(@Target, '/Fonts/'))]"/>
</Relationships>
</xsl:template>
<xsl:template match="r:Relationship">
-->
<Relationship
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<xsl:apply-templates select="@*"/>
</Relationship>
</xsl:template>
<xsl:template match="x:FixedDocument|x:FixedPage|x:FixedDocumentSequence">
-->
<xsl:element name="{name(.)}"
namespace="http://schemas.microsoft.com/xps/2005/06">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="*"/>
</xsl:element>
</xsl:template>
<xsl:template match="*">
-->
<xsl:element name="{name(.)}"
namespace="http://schemas.microsoft.com/xps/2005/06">
<xsl:apply-templates select="@*"/>
<xsl:choose>
<xsl:when test="count(*) > 0">
-->
<xsl:apply-templates select="*"/>
</xsl:when>
<xsl:otherwise>
-->
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
-->
<xsl:attribute name="{name(.)}">
<xsl:choose>
<xsl:when test="starts-with(., '/Documents/1/Resources/Fonts/')">
-->
<xsl:value-of select="substring-after(., '/Documents/1')"/>
</xsl:when>
<xsl:when test="starts-with(., '/Documents/1')">
-->
<xsl:value-of select="concat('/Documents/2',
substring-after(., '/Documents/1'))"/>
</xsl:when>
<xsl:otherwise>
-->
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
The final XSLT actually produces text output. This one is designed to read all of the .rels files (those for each page, and the other one in the 'root' _rels folder, plus any others) and simply generate a listing that we can process to determine what resources and metadata files we really need.
="1.0"
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:r="http://schemas.openxmlformats.org/package/2006/relationships"
exclude-result-prefixes="r">
<xsl:output indent="yes" method="text"
encoding="utf-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
-->
<xsl:for-each select="r:Relationships/r:Relationship/@Target">
<xsl:value-of select="."/>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output from this last XSLT is the only one we don't pump out to a temporary file. It instead is pushed via stream into a StringBuilder
that's later processed into a list.
That then brings up stage 3 of processing the original XPS file. In the third pass, the files we worked with in the second stage are skipped (their processed output is already in the new XPS file); instead, it picks up all the resource and metadata files and, using the above list, puts them into the right places in the new XPS file. Any 'duplicate' files are tossed (ignored), and then finally, any other outstanding files are also grabbed at this time.
if (dupFiles.ContainsKey(processingFileName))
continue;
if ((processingFileName.StartsWith("Documents/1/Pages")
&& processingFileName.EndsWith(".fpage")) ||
(processingFileName.EndsWith(".fpage.rels")))
{
}
else if (processingFileName.StartsWith("Documents/1/Resources/Fonts"))
{
#region Resource files that require 'moving' to the 'root' Resources folder
string newFileName = processingFileName.Replace("Documents/1/", "");
if (relsFileNames.Contains(newFileName))
{
Console.WriteLine("XPS file entry '{0}' moving to {1}",
processingFileName,
processingFileName.Replace("Documents/1/", ""));
CopyZipEntry(ze.Name.Replace("Documents/1/", ""), s, zf, ze);
}
#endregion
}
else if (processingFileName.StartsWith("Documents/1/") ||
processingFileName.Contains("_rels/") ||
processingFileName.EndsWith(".fdseq"))
{
bool bTransformRequired = (processingFileName.EndsWith(".rels") ||
processingFileName.EndsWith(".fdoc") ||
processingFileName.EndsWith(".fdseq"));
if (!bTransformRequired)
{
#region Files that only require 'moving' to document '2'
string newFileName =
processingFileName.Replace("Documents/1/", "Documents/2/");
if (relsFileNames.Contains(newFileName))
{
Console.WriteLine("XPS file entry '{0}' moving to {1}",
processingFileName, newFileName);
CopyZipEntry(ze.Name.Replace("Documents/1/",
"Documents/2/"), s, zf, ze);
}
#endregion
}
}
else
{
#region Files we just put in the same place in the new zip
Console.WriteLine("XPS file entry '{0}' transferred as is", processingFileName);
CopyZipEntry(ze.Name, s, zf, ze);
#endregion
}
With all the files moved into their new places (and redundant/duplicate ones silently dropped), the new XPS is closed and the program has finished 'cleaning up' the original XPS.
This application uses the ICSharpCode SharpZipLib library to do its zip file packing and unpacking (http://www.icsharpcode.net/OpenSource/SharpZipLib/Default.aspx). It's not included in the project, so you'll need to download separately.
I also used Stylus Studio for the XSLT coding (http://www.stylusstudio.com). Although, in this particular application, the XSLT is pretty trivial in its nature.
I also strongly recommend reading the official XPS spec from Microsoft (http://www.microsoft.com/whdc/xps/downloads.mspx) and obtaining the sample XPS documents (http://www.microsoft.com/whdc/XPS/XpsSamples.mspx) from which I gained more than a few insights. Also consult the official team blog (http://blogs.msdn.com/xps/default.aspx) and Feng Yuan's blog (http://blogs.msdn.com/fyuan/default.aspx).
I'd also recommend:
I'll try not to duplicate too much of the work of all these people.
Along with this, I also recommend getting a copy of the IsXPS.exe test tool, which you'll find in the Windows Driver Kit (WDK).
History
- 2008-04-19: First version completed.
- 2008-04-21: Added some more recommended reading.
- 2008-04-28: Added an additional processing stage to identify duplicate files and eliminate them.