CodeProject
I've been working on Microsoft Word automation, Open XML, Microsoft.Office.Interop.Word
and Open XML 2.0 SDK. In this blog, I'll focus on Content Controls and Open XML 2.0 SDK from the experience I gained in the last 2 months.
In this blog, I'll discuss the points mentioned below:
- Add Custom XML part to
WordprocessingDocument
- Get Custom XML part from
WordprocessingDocument
- Each content control contains a unique ID that is assigned by Word upon creation of the content control (Issues this may cause and how it can be handled)
- Convert in-memory Document to Bytes without saving to a File
Add Custom XML Part to WordprocessingDocument
- Get the
MainDocumentPart
:
MainDocumentPart mainPart = doc.MainDocumentPart;
- Define a root element for the Custom XML part:
string customXmlPartNamespace = "http://schemas.microsoft.com/Test.Sample";
string rootNodeName = "TestCoverageRoot";
XName rootName = XName.Get(rootNodeName, customXmlPartNamespace);
XElement rootElement = new XElement(rootName);
- The method displayed in the code snippet below does the rest:
public static CustomXmlPart AddCustomXmlPart
(MainDocumentPart mainPart, XElement rootElement)
{
CustomXmlPart customXmlPart =
mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter sw = new StreamWriter(customXmlPart.GetStream()))
{
sw.Write(rootElement.ToString());
sw.Close();
}
return customXmlPart;
}
Get Custom XML Part from a WordprocessingDocument
The code snippet displayed below assumes that namespace is unique for each CustomXml
part. If this is true, I just check for the root node namespace only as displayed below:
string namespaceUri= "http://schemas.microsoft.com/Test.Sample";
public static CustomXmlPart GetCustomXmlPart
(MainDocumentPart mainPart, string namespaceUri)
{
CustomXmlPart result = null;
foreach (CustomXmlPart part in mainPart.CustomXmlParts)
{
using (XmlTextReader reader =
new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read)))
{
XmlNodeType nodeType = reader.MoveToContent();
bool exists = reader.NamespaceURI.Equals(namespaceUri);
reader.Close();
if (exists)
{
result = part;
break;
}
}
}
return result;
}
Each Content Control Contains a Unique ID that is Assigned by Word upon Creation of the Content Control
Every content control will have a unique ID so you can associate that Content Control with a Custom XML part and achieve cool functionalities, otherwise it is impossible through Custom XML. But then everything has a negative side which may not affect in 95% of the cases, but in 5% it may cause some issues. I'll discuss about one of the issues I faced and then an approach that worked.
I was implementing a lot of Word automation related tasks, e.g. copy/pasting content controls, merging documents having content controls and suddenly one of the test cases while doing merge operation failed. When I drilled further, I found that both the documents were having different Content Controls with the same IDs. So during merge (Library was using Microsoft.Office.Interop.Word
12.0), we are doing as displayed in the code snippet below:
string fileName = "testFileToInsert.docx";
range.InsertFile(fileName, ref m_Missing, ref m_Missing, ref m_Missing, ref m_Missing);
In this scenario for any Controls having the same ID in file, we are inserting Word automatically assigns them a new ID to make the Control ID unique across the document. As we had Custom XML parts associated to Content Controls in both the documents, I was not able to map the data to the Custom XML part now, i.e. if duplicate Control IDs are 10, 20 and Word now assigns 23356 and 45556, I was not able to figure out if 10 corresponds to 23356 or 45556. As I was not able to map a previous Id to a new Id, I was not able to extract the information I had in Custom XML part.
As I could not find any solution, what I decided was to use the Tag
property of Content Control. So instead of relying on Control ID, I decided to assign a unique GUID for every content control and save that in the Tag
property. The only drawback in this case is that if you set “ActiveDocument.ToggleFormsDesign = True
” or “Design Mode
” in Developer tab in Microsoft Word is activated, you will see those Tags now.
As I didn't have any functional limitation (Developer mode was disabled), in this case I proceed with this solution.
In brief, the solution was:
- Get the Range from the Document where you want to insert the .docx file
- Read the Custom XML part associated with the file to be inserted
- Call
Range.InsertFile
method - From the Custom XML part that you read in step 2 as per your business logic, add data in the Custom XML part associated with the
Document
(Range.Document
) into which we inserted - As Tags were unique (GUIDs), for any automatic rename that would had happened for duplicated, it will not affect our functionality.
This issue may appear while doing Copy/Paste operations and the approach listed above may work.
Convert in-memory Document to Bytes without Saving to a File
Here I'll list down one approach that worked in my case where I had to convert in-memory document to Bytes without saving to a File. This particular Document was loaded in some other module(process) using Microsoft.Office.Interop.Word
12.0 and from there, we had to pass a byte stream without saving document to file.
The code snippet below is implemented in Open XML 2.0, so for that I passed the Outer XML of MainDocumentPart
as string
and this method returns me the byte array.
public static byte[] GetDocumentStream(string mainDocumentPartOuterXml)
{
byte[] output = null;
if (string.IsNullOrEmpty(mainDocumentPartOuterXml))
{
return output;
}
string packageNodeName = "pkg";
string packageUri = "http://schemas.microsoft.com/office/2006/xmlPackage";
string partNameSpaceUri = "http://schemas.microsoft.com/office/2006/xmlPackage";
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(new NameTable());
namespaceManager.AddNamespace(packageNodeName, packageUri);
XPathDocument xpathDocument =
new XPathDocument(new StringReader(mainDocumentPartOuterXml));
XPathNavigator navigator = xpathDocument.CreateNavigator();
XPathNodeIterator iterator = navigator.Select("//pkg:part", namespaceManager);
using (MemoryStream ms = new MemoryStream())
{
using (Package pkg = Package.Open(ms, FileMode.Create))
{
while (iterator.MoveNext())
{
Uri partUri = new Uri(iterator.Current.GetAttribute
("name", partNameSpaceUri), UriKind.Relative);
if (pkg.PartExists(partUri))
pkg.DeletePart(partUri);
PackagePart part = pkg.CreatePart(
partUri
, iterator.Current.GetAttribute("contentType", partNameSpaceUri));
XElement elem = XElement.Parse(iterator.Current.InnerXml);
byte[] buffer = null;
string elementToWrite = elem.FirstNode.ToString();
if (elem.Name.LocalName.Equals("binaryData", StringComparison.OrdinalIgnoreCase))
{
buffer = Convert.FromBase64String(elementToWrite);
}
else
{
buffer = Encoding.UTF8.GetBytes(elementToWrite);
}
part.GetStream().Write(buffer, 0, buffer.Length);
}
pkg.Flush();
pkg.Close();
}
ms.Position = 0;
output = new byte[(int)ms.Length];
ms.Read(output, 0, (int)ms.Length);
ms.Flush();
ms.Close();
}
return output;
}
Summary
Whatever solutions I have listed worked in my case, it may or may not work for some functional requirements. Also there may be better ways to implement the same which I did not find due to lack of time, lack of experience in Microsoft Word automation, etc. as I only worked for 2 months in OpenXml 2.0, Microsoft.Office.Interop.Word
while migrating an application from Custom XML to Content controls. I'm providing the references that helped me a lot.
References