Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Help System Automation

2.82/5 (5 votes)
26 Sep 2007CPOL2 min read 1   245  
Word Document Automation

Introduction

This simple program shows how to generate a help system using an existing Word document. The program generates HTML files and XML file to be added in the Web project.

In this article, I just give the main ideas. For more details, check the source code and the sample Word document.

For Whom is this Article

The article is for developers who would like to start working in Microsoft Word automation programs.

Microsoft Word 2000

This program is based on using style and formatting in your Word document to be later converted into XML files.

I have used Word DLL of Word application, reference it in your project.

plain
(Microsoft Office 11.0 Object Library)

Project in More Detail

Word 2000 has format and style like (TOC, TOCEntry, Heading, ...) I have used these styles to be automated with my program.

So if a user wants to use my program, she/he must use styles.

This program generates two XML files:

  1. Table of content
  2. Document

Convert Document XML file to HTMLs files, and Table of content XML file to be used as DataSource in tree or any navigation control.

Using Library Word Part 1

Add reference to Word document Microsoft Word 11.0 Object Library to be used.
Look at WordApp.cs.

Add the reference:

C#
Word = Microsoft.Office.Interop.Word; 

I have used...

C#
Word.ApplicationClass wordApplication; 

... to gain access to Word document properties and text, etc.

C#
String WordFilePath ;//this is the path of your document

To Open Word Document

C#
//------ var
private Word.Document doc;
private Word.Paragraphs DocParagraphs;
public String WordFilePath;
private Word.InlineShapes Inshapes;
///-------

This opens the Word document and uses the doc object.

C#
wordApplication = new Word.ApplicationClass();
object o_nullobject = System.Reflection.Missing.Value;
object o_filePath = WordFilePath;
object tru = false;
object tr = true;
wordApplication.Visible = false;// make Microsoft Word work in background
doc = wordApplication.Documents.Open(ref o_filePath,
ref o_nullobject, ref tr, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref tru, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);

Get Inline Shapes

C#
public Word.InlineShapes getInlineDocumentShape()
{
foreach (Word.Shape W in doc.Shapes)
{
W.ConvertToInlineShape();
}

Word.InlineShapes ishape;
ishape = doc.InlineShapes;
Inshapes = ishape;
return Inshapes;
}

Get Word Paragraphs

C#
public Word.Paragraphs getDocumentParagraphs()
{
return DocParagraphs =doc.Paragraphs ;
}

Now Converting to XML

See DocumentParser.cs for more details.

C#
{
TableOfContent.xml :for table of content
Document.xml : for word document paragraphs and Images
public void ParsToXml() {...}
XmlTextWriter tocWriter;//table of content writer
XmlTextWriter parWriter;//paragraph writer
//---

Word

C#
Paragraphs pars = getDocumentParagraphs(); //to get word paragraphs Word.
InlineShapes inShapes = getInlineDocumentShape(); //to get word images

Now start the loop for each paragraph to get style and text.

Paragraph Styles

  • Heading: Every Topic in document starts with Heading(N)
    • N =1 Main topic
    • N>1 Sub topic
  • TOC: Every topic in table of content starts with TOC(N)
    • N =1 Main topic
    • N>1 sub topic
  • ImageStyle: Every Image in document has this style.

    C#
    for(index = 1; index < pars.Count; index++)
    {
    style = ((Word.Style)pars[index].get_Style()).NameLocal; 

Format and Style of Table of Content

C#
if(style.StartsWith("TOC ")) //this style of table of content

Every topic starts with style [TOC ].

Example:

  • [TOC1] 1.Introduction
  • [TOC2] 1.1 Author
  • [TOC2] 1.2 About
  • [TOC3] 1.2.1 About book

{ ..take difference for current level and next level}

A sample XML file of Table of content is as follows:

XML
<TableOfContent>...
<Topic level="4" name="INTERFACE REQUIREMENTS" page="6">
<Topic level="4.1" name="User Interfaces" page="6">
<Topic level="4.1.1" name="Accessibility" page="6" />
<Topic level="4.1.2" name="System messages" page="6" />
<Topic level="4.1.3" name="Paging" page="7" />
<Topic level="4.1.4" name="Data lists and Data grids" page="7" />
</Topic>
<Topic level="4.2" name="Hardware Interfaces" page="8" />
<Topic level="4.3" name="Software Interfaces" page="8">
<Topic level="4.3.1" name="Operating Platform" page="8" />
<Topic level="4.3.2" name="Storage engine" page="8" />
<Topic level="4.3.3" name="External data sources" page="8" />
</Topic>

...<TableOfContent>

If the Style is ImageStyle

C#
InlineShapes inShapes = getInlineDocumentShape(); to get inline shape from document
//mindex :index of inline shape in document
inShapes[mindex].Select(); //make the select to copy in clipboard
wordApplication.Selection.CopyAsPicture();

To Get Words of Paragraphs

If the style is Heading:

C#
Word.Words words;
words = pars[index].Range.Words;//take words of paragraph

Check if the word has a list type:

C#
for (windex = 1; windex <= words.Count; windex++)
{//check
if (words[windex].FormattedText.ListFormat.ListType.ToString() == "wdListNoNumbering")
{....check format for each word and write it to xml Text node
using FormatingFunction(,)
public String FormatingFunction(Word.Words obj, int index)
{
if (index > obj.Count)
{
return "";
}
String fr = "";
if (obj[index].Bold.ToString() == "-1")
{fr = "Bold";}
if (obj[index].Italic.ToString() == "-1")
{if (fr != ""){
fr += "," + "Italic";}
else
{fr = "Italic";}
}//wdUnderlineSingle//WdUnderLineNone
if (obj[index].Underline.ToString() == "wdUnderlineSingle")
{if (fr != ""){
fr += "," + "UnderLine";}
else{fr = "UnderLine";
} }
return fr;}
else
{...write it in list node..}
}
} //---------
XML
<Text Format="">is </Text>
<Text Format="Italic">Performance Management System </Text>
<Text Format="">that helps you collect different measures and make faster and smarter
decisions through a set of user friendly customizable dashboards and scorecards
targeted for each and every member of your organization.</Text>
<Text Format="Italic" />
</Paragraph>
</Topic>
<Topic Name="Product Features" Level="2.2">
<Paragraph>
<Text Format="" />
<Image src="2.21">j7B/wBND+VFFAB/Z4/56fpR9g/6aH8qKKAD7AP+eh/KroGABRRQB//Z</Image>
<Paragraph>
<Text Format="">The figure above provides a high level vision of </Text>
<Text Format="Bold">Cub </Text>
<Text Format="">solution. The vision includes the idea of hiding the complexity of
creating ETL (Extract, Transform &amp; Load) processes, a data warehouse and an
OLAP database for analysis from the end user.
</Text>
</Paragraph>
<Paragraph>
<Text Format="" />
<Paragraph>
<Text Format="">It will provide end users with a sub-set of the features offered by
the underlying systems, taking into account the ability to extend this set
in future releases. As well as linking with existing DW and OLAP database
provided as part of an implementation service.
</Text>
</Paragraph>
</Paragraph>
</Paragraph>
</Topic>

Converting XML to HTML

I built an HTML convertor to convert XML nodes to HTML. Check HtmlConvertor.cs.

Future Plans

I will give more explanation for this article.

Wait for future articles:

  • Dynamic Online Flexible GridView
  • SpyWare

History

  • 26th September, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)