Introduction
This article describes how to build a simple Microsoft Word document viewer (.docx) format.
It is useful for viewing the Word document in your project for any purpose.
The viewer is very simple at the current state and needs a lot more development. This article will describe only the concept.
The viewer depends on two major open source libraries:
The viewer language is Visual Basic .NET.
Background
I was working on a project whose main data exists in a Word document and I found that the only way for data entry is to view the document on a form and choose and select parts of it and copy it for saving in the database.
I searched for a Word document viewer on the internet and did not find any. All that I found is a library for reading the (.docx) format and returning the data in .NET object, I chose DocX for this purpose.
Then I thought if I could read the file and view it myself, I search for RTF library and chose the String
builder for RTF.
By compiling these two libraries, I could build this viewer.
Using the Code
The viewer solution consists of two projects:
WordDocViewer
, a Windows form application WordFile
, a class library project
The Windows Form project is the host and responsible for viewing the RTF result on an MDI child form using RichTextBox
control.
The RTF result is built by the RTFlib
after reading the document by DocX
library in the class library project.
The class library is very simple - it has two classes:
Document
represents the Word document and can load the Word document file and parses the pages. Page
because the DocX
library has no page class. I create one to keep each page paragraphs together.
I could parse the pages by searching for the line feed character in each paragraph and when I find it, I split the paragraph into two parts of text and consider the new part is a new paragraph.
This is the Load
function:
Public Function Load(File As String) As Boolean
Try
Me.Doc = DocX.Load(File)
Dim Page As Page = New Page With {._Index = Me.Pages.Count + 1}
Dim Pos As Short = 0
Dim Text As String = String.Empty
Me.Pages.Add(Page)
For Each Paragraph As Novacode.Paragraph In Me.Doc.Paragraphs
If Paragraph.Text.Contains(vbLf) Then
Text = Paragraph.Text
Pos = Text.IndexOf(vbLf)
Paragraph.ReplaceText(Text.Substring(Pos + 1), String.Empty)
Page.Paragraphs.Add(Paragraph)
Page = New Page With {._Index = Pages.Count + 1}
Page.Paragraphs.Add(Paragraph.InsertParagraphAfterSelf(Text.Substring(Pos + 1)))
Me.Pages.Add(Page)
Else
Page.Paragraphs.Add(Paragraph)
End If
Next
Return True
Catch ex As Exception
Throw
End Try
Return False
End Function
To view images in the viewer, the RTFlib
needs to pass a Drawing.Image
type parameter to its InsertImage
function and for that, I create the GetImage
function in the Document
class.
Public Function GetImage(Picture As Novacode.Picture) As Drawing.Image
Dim DocImage As Novacode.Image = Nothing
Dim Image As Drawing.Image = Nothing
Dim stream = Nothing
DocImage = Me.Doc.Images.Find(Function(T) T.Id = Picture.Id)
If DocImage IsNot Nothing Then
stream = DocImage.GetStream(IO.FileMode.Open, IO.FileAccess.Read)
Dim Buffer(stream.Length) As Byte
stream.Read(Buffer, 0, Buffer.Length)
Image = Drawing.Image.FromStream(stream)
stream.Close()
End If
Return Image
End Function
Here is a captured image of the viewer:
Points of Interest
The viewer is very simple and easy to understand and is also very easy to convert to C# language.
History
- 8th November, 2018: First release