Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Simple Word Document Viewer

0.00/5 (No votes)
10 Nov 2018 2  
Simple Word Document File Viewer

Introduction

This article describes how to build a simple Microsoft Word document viewer (.docx) format.

It is useful for viewing the Word document in your project for any purpose.

The viewer is very simple at the current state and needs a lot more development. This article will describe only the concept.

The viewer depends on two major open source libraries:

The viewer language is Visual Basic .NET.

Background

I was working on a project whose main data exists in a Word document and I found that the only way for data entry is to view the document on a form and choose and select parts of it and copy it for saving in the database.

I searched for a Word document viewer on the internet and did not find any. All that I found is a library for reading the (.docx) format and returning the data in .NET object, I chose DocX for this purpose.

Then I thought if I could read the file and view it myself, I search for RTF library and chose the String builder for RTF.

By compiling these two libraries, I could build this viewer.

Using the Code

The viewer solution consists of two projects:

  • WordDocViewer, a Windows form application
  • WordFile, a class library project

The Windows Form project is the host and responsible for viewing the RTF result on an MDI child form using RichTextBox control.

The RTF result is built by the RTFlib after reading the document by DocX library in the class library project.

The class library is very simple - it has two classes:

  • Document represents the Word document and can load the Word document file and parses the pages.
  • Page because the DocX library has no page class. I create one to keep each page paragraphs together.

I could parse the pages by searching for the line feed character in each paragraph and when I find it, I split the paragraph into two parts of text and consider the new part is a new paragraph.

This is the Load function:

Public Function Load(File As String) As Boolean
    Try
        Me.Doc = DocX.Load(File)
        Dim Page As Page = New Page With {._Index = Me.Pages.Count + 1}
        Dim Pos As Short = 0
        Dim Text As String = String.Empty

        Me.Pages.Add(Page)
        For Each Paragraph As Novacode.Paragraph In Me.Doc.Paragraphs
            If Paragraph.Text.Contains(vbLf) Then
                Text = Paragraph.Text
                Pos = Text.IndexOf(vbLf)

                Paragraph.ReplaceText(Text.Substring(Pos + 1), String.Empty)
                Page.Paragraphs.Add(Paragraph)

                Page = New Page With {._Index = Pages.Count + 1}
                Page.Paragraphs.Add(Paragraph.InsertParagraphAfterSelf(Text.Substring(Pos + 1)))
                Me.Pages.Add(Page)
            Else
                Page.Paragraphs.Add(Paragraph)
            End If
        Next
        Return True
    Catch ex As Exception
        Throw
    End Try

    Return False
End Function

To view images in the viewer, the RTFlib needs to pass a Drawing.Image type parameter to its InsertImage function and for that, I create the GetImage function in the Document class.

Public Function GetImage(Picture As Novacode.Picture) As Drawing.Image
    Dim DocImage As Novacode.Image = Nothing
    Dim Image As Drawing.Image = Nothing
    Dim stream = Nothing

    DocImage = Me.Doc.Images.Find(Function(T) T.Id = Picture.Id)
    If DocImage IsNot Nothing Then
        stream = DocImage.GetStream(IO.FileMode.Open, IO.FileAccess.Read)
        Dim Buffer(stream.Length) As Byte
        stream.Read(Buffer, 0, Buffer.Length)
        Image = Drawing.Image.FromStream(stream)
        stream.Close()

    End If

    Return Image
End Function

Here is a captured image of the viewer:

Word Viewer

Points of Interest

The viewer is very simple and easy to understand and is also very easy to convert to C# language.

History

  • 8th November, 2018: First release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here