Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / Office / MS-Word

Converting a Microsoft Word document to a text file in C#

4.69/5 (7 votes)
5 Jan 2014CPOL2 min read 44.8K   1.9K  
This Tip explains how to convert a Microsoft Word document to a text file in C#, using the Microsoft Word Object Library

Introduction

In this tip, I'll explain how to convert a Microsoft Word document to a text file in C#. To do this, Word must be installed.

Adding a reference to the Microsoft Word Object Library

The first step is to add a reference to the Microsoft Word Object Library. In Visual Studio, choose "Add Reference...", go to "COM", and select "Microsoft Word [version number here] Object Library".
Image 1
As you can see on the image, I use the Microsoft Word 15.0 Object Library, that's the library of Word 2013. You can have another number than 15.0.

The code

At the top of the code file, we will add the following using [namespace] statements:

C#
using System.IO;
using Word = Microsoft.Office.Interop.Word;

Now, we can just write Word.Document instead of Microsoft.Office.Interop.Word.Document for example. Now, we will ask the user which file (s)he wants to convert, using the following code:

C#
Console.WriteLine("Please enter the full file path of your Word document (without quotes):");
object path = Console.ReadLine();
Console.WriteLine("Please enter the file path of the text document in which you want to store the text of your word document (without quotes):");
string txtPath = Console.ReadLine();

As you can read in the code, for the path of the Word document, the full path is required. If you just write test.docx, then you'll actually try to convert C:\Windows\system32\test.docx instead of the test.docx file in the folder of the converter. For the file path of the text file, it is OK to write test.txt, because then it will create the test.txt file in the folder of the converter. It is also necessary that the path to the Word file is an object, not a string, because when we're going to open the Word file, the parameters should be objects. Now, we'll open the Word file and retrieve the text using the following code:

C#
Word.Application app = new Word.Application();
Word.Document doc;
object missing = Type.Missing;
object readOnly = true;
try
{
    doc = app.Documents.Open(ref path, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing);
    string text = doc.Content.Text;
    File.WriteAllText(txtPath, text);
    Console.WriteLine("Converted!");
}

Here, we create a Word Application that opens the document. The first argument of the Open method is the file path, the third argument is whether we want to open the file as read-only (yes in this case). The text is stored in Content.Text, and then we use the File.WriteAllText method to write the text to a file. Now, we'll create the catch and finally blocks:

C#
catch
{
    Console.WriteLine("An error occured. Please check the file path to your word document, and whether the word document is valid.");
}
finally
{
    object saveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
    app.Quit(ref saveChanges, ref missing, ref missing);
}

Because we don't want to save the changes (we didn't even make changes), we use WdSaveOptions.wdDoNotSaveChanges. The Application.Quit method closes all open documents, and quits the Word Application. If we merge all code snippets, we get this:

C#
Console.WriteLine("Please enter the full file path of your Word document (without quotes):");
object path = Console.ReadLine();
Console.WriteLine("Please enter the file path of the text document in which you want to store the text of your word document (without quotes):");
string txtPath = Console.ReadLine();
Word.Application app = new Word.Application();
Word.Document doc;
object missing = Type.Missing;
object readOnly = true;
try
{
    doc = app.Documents.Open(ref path, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing);
    string text = doc.Content.Text;
    File.WriteAllText(txtPath, text);
    Console.WriteLine("Converted!");
}
catch
{
    Console.WriteLine("An error occured. Please check the file path to your word document, and whether the word document is valid.");
}
finally
{
    object saveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
    app.Quit(ref saveChanges, ref missing, ref missing);
}

History

  • 5 Jan 2014: First version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)