Introduction
This simple program shows how to generate a help system using an existing Word document. The program generates HTML files and XML file to be added in the Web project.
In this article, I just give the main ideas. For more details, check the source code and the sample Word document.
For Whom is this Article
The article is for developers who would like to start working in Microsoft Word automation programs.
Microsoft Word 2000
This program is based on using style and formatting in your Word document to be later converted into XML files.
I have used Word DLL of Word application, reference it in your project.
(Microsoft Office 11.0 Object Library)
Project in More Detail
Word 2000 has format and style like (TOC, TOCEntry, Heading, ...) I have used these styles to be automated with my program.
So if a user wants to use my program, she/he must use styles.
This program generates two XML files:
- Table of content
- Document
Convert Document XML file to HTMLs files, and Table of content XML file to be used as DataSource in tree or any navigation control.
Using Library Word Part 1
Add reference to Word document Microsoft Word 11.0 Object Library to be used.
Look at WordApp.cs.
Add the reference:
Word = Microsoft.Office.Interop.Word;
I have used...
Word.ApplicationClass wordApplication;
... to gain access to Word document properties and text, etc.
String WordFilePath ;
To Open Word Document
private Word.Document doc;
private Word.Paragraphs DocParagraphs;
public String WordFilePath;
private Word.InlineShapes Inshapes;
This opens the Word document and uses the doc
object.
wordApplication = new Word.ApplicationClass();
object o_nullobject = System.Reflection.Missing.Value;
object o_filePath = WordFilePath;
object tru = false;
object tr = true;
wordApplication.Visible = false;
doc = wordApplication.Documents.Open(ref o_filePath,
ref o_nullobject, ref tr, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref tru, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);
Get Inline Shapes
public Word.InlineShapes getInlineDocumentShape()
{
foreach (Word.Shape W in doc.Shapes)
{
W.ConvertToInlineShape();
}
Word.InlineShapes ishape;
ishape = doc.InlineShapes;
Inshapes = ishape;
return Inshapes;
}
Get Word Paragraphs
public Word.Paragraphs getDocumentParagraphs()
{
return DocParagraphs =doc.Paragraphs ;
}
Now Converting to XML
See DocumentParser.cs for more details.
{
TableOfContent.xml :for table of content
Document.xml : for word document paragraphs and Images
public void ParsToXml() {...}
XmlTextWriter tocWriter;
XmlTextWriter parWriter;
Word
Paragraphs pars = getDocumentParagraphs();
InlineShapes inShapes = getInlineDocumentShape();
Now start the loop for each paragraph to get style and text.
Paragraph Styles
Format and Style of Table of Content
if(style.StartsWith("TOC "))
Every topic starts with style [TOC ]
.
Example:
[TOC1]
1.Introduction[TOC2]
1.1 Author[TOC2]
1.2 About[TOC3]
1.2.1 About book
{ ..take difference for current level and next level}
A sample XML file of Table of content is as follows:
<TableOfContent>...
<Topic level="4" name="INTERFACE REQUIREMENTS" page="6">
<Topic level="4.1" name="User Interfaces" page="6">
<Topic level="4.1.1" name="Accessibility" page="6" />
<Topic level="4.1.2" name="System messages" page="6" />
<Topic level="4.1.3" name="Paging" page="7" />
<Topic level="4.1.4" name="Data lists and Data grids" page="7" />
</Topic>
<Topic level="4.2" name="Hardware Interfaces" page="8" />
<Topic level="4.3" name="Software Interfaces" page="8">
<Topic level="4.3.1" name="Operating Platform" page="8" />
<Topic level="4.3.2" name="Storage engine" page="8" />
<Topic level="4.3.3" name="External data sources" page="8" />
</Topic>
...<TableOfContent>
If the Style is ImageStyle
InlineShapes inShapes = getInlineDocumentShape(); to get inline shape from document
inShapes[mindex].Select();
wordApplication.Selection.CopyAsPicture();
To Get Words of Paragraphs
If the style is Heading:
Word.Words words;
words = pars[index].Range.Words;
Check if the word has a list type:
for (windex = 1; windex <= words.Count; windex++)
{
if (words[windex].FormattedText.ListFormat.ListType.ToString() == "wdListNoNumbering")
{....check format for each word and write it to xml Text node
using FormatingFunction(,)
public String FormatingFunction(Word.Words obj, int index)
{
if (index > obj.Count)
{
return "";
}
String fr = "";
if (obj[index].Bold.ToString() == "-1")
{fr = "Bold";}
if (obj[index].Italic.ToString() == "-1")
{if (fr != ""){
fr += "," + "Italic";}
else
{fr = "Italic";}
}
if (obj[index].Underline.ToString() == "wdUnderlineSingle")
{if (fr != ""){
fr += "," + "UnderLine";}
else{fr = "UnderLine";
} }
return fr;}
else
{...write it in list node..}
}
}
<Text Format="">is </Text>
<Text Format="Italic">Performance Management System </Text>
<Text Format="">that helps you collect different measures and make faster and smarter
decisions through a set of user friendly customizable dashboards and scorecards
targeted for each and every member of your organization.</Text>
<Text Format="Italic" />
</Paragraph>
</Topic>
<Topic Name="Product Features" Level="2.2">
<Paragraph>
<Text Format="" />
<Image src="2.21">j7B/wBND+VFFAB/Z4/56fpR9g/6aH8qKKAD7AP+eh/KroGABRRQB//Z</Image>
<Paragraph>
<Text Format="">The figure above provides a high level vision of </Text>
<Text Format="Bold">Cub </Text>
<Text Format="">solution. The vision includes the idea of hiding the complexity of
creating ETL (Extract, Transform & Load) processes, a data warehouse and an
OLAP database for analysis from the end user.
</Text>
</Paragraph>
<Paragraph>
<Text Format="" />
<Paragraph>
<Text Format="">It will provide end users with a sub-set of the features offered by
the underlying systems, taking into account the ability to extend this set
in future releases. As well as linking with existing DW and OLAP database
provided as part of an implementation service.
</Text>
</Paragraph>
</Paragraph>
</Paragraph>
</Topic>
Converting XML to HTML
I built an HTML convertor to convert XML nodes to HTML. Check HtmlConvertor.cs.
Future Plans
I will give more explanation for this article.
Wait for future articles:
- Dynamic Online Flexible GridView
- SpyWare
History
- 26th September, 2007: Initial post