Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#4.0

Microsoft Interop API to convert the .doc, .docx, .dot, .dotx and .xls,.xlsx, .rtf to HTML

5.00/5 (11 votes)
13 Dec 2012CPOL3 min read 89.6K   6.4K  
Convert Word documents, Excel sheets to HTML files using Microsoft Office Interop API and render the result back to a client browser.

Table of Contents 

  • Introduction. 
  • Microsoft office Interop library
  • Adding the reference of Microsoft Interop libraries.
  • Using the code   
  • Access the Converter functionality
  • Summary
  • Disclaimer  

Introduction

This article is about using Microsoft Office Interop APIs to convert Word documents and Excel sheets and document templates to an HTML file and render on a client browser. Sometimes developer find it difficult to convert the excel sheets and document to equivalent html, then office interop api are good solutions comes as very handy.

Microsoft Office Interop library  

Before using Microsoft office interop APIs, you have to install the Microsoft Office on your system. without ms office we can not run Microsoft Office interop APIs. If you have not msoffice install please first install the ms office.

Download Microsoft office 

Adding the reference of Microsoft Interop libraries

If you have installed the ms office then add the references of required Microsoft office interop libraries.

  1. Microsoft Office Excel library. 
  2. Microsoft Office Word library
  3. Microsoft Office object library 
In this article i will show the functionality to covert the word document files and excel files to html file, so we only need to add the reference of these above 3 libraries.

Steps to add library references

  1. Right click on Reference folder in your solution
  2. Click Add reference
  3. Click on COM tab
  4. Select Microsoft Office 8.0 or 14.0 object library, press the control key and select the Microsoft Office Excel library  and Microsoft Office Word library
  5. Click on OK button.

Note: Assembly can be different, it is based on the Office version installed in your machine.

Image 1

Using the code  

Before actually building the code you must have MS office installed in your office, you also need to configure the ckEditor. Because I am using ckEditor to display the HTML Content that is generated from document or excel sheet. Add the following config to you page setting in web.config file.

XML
<controls>
    <add tagPrefix="CKEditor" assembly="CKEditor.NET" namespace="CKEditor.NET"/>
</controls>

DocToHtml class  

Word document to HTML conversion has been implemented in class below is the snipped of the actual code which convert the doc file to HTML string. 

C#
public StringBuilder Convert()
{
    Application objWord = new Application();

    if (File.Exists(FileToSave))
    {
        File.Delete(FileToSave);
    }
    try
    {
        objWord.Documents.Open(FileName: FullFilePath);
        objWord.Visible = false;
        if (objWord.Documents.Count > 0)
        {
            Microsoft.Office.Interop.Word.Document oDoc = objWord.ActiveDocument;
            oDoc.SaveAs(FileName: FileToSave, FileFormat: 10);
            oDoc.Close(SaveChanges: false);
        }
    }
    finally
    {
        objWord.Application.Quit(SaveChanges: false);
    }
    return base.ReadConvertedFile();
}

XlsToHtml class 

Excel sheet to HTML conversion has been implemented in class below is the snipped of the actual code which convert the Excel file to HTML string.

C#
public StringBuilder Convert()
{
    Application excel = new Application();

    if (File.Exists(FileToSave))
    {
        File.Delete(FileToSave);
    }
    try
    {
        excel.Workbooks.Open(Filename: FullFilePath);
        excel.Visible = false;
        if (excel.Workbooks.Count > 0)
        {
            IEnumerator wsEnumerator = excel.ActiveWorkbook.Worksheets.GetEnumerator();
            object format = Microsoft.Office.Interop.Excel.XlFileFormat.xlHtml;
            int i = 1;
            while (wsEnumerator.MoveNext())
            {
                Microsoft.Office.Interop.Excel.Worksheet wsCurrent = (Microsoft.Office.Interop.Excel.Worksheet)wsEnumerator.Current;
                String outputFile = "excelFile" + "." + i.ToString() + ".html";
                wsCurrent.SaveAs(Filename: FileToSave, FileFormat: format);
                ++i;
                break;
            }
            excel.Workbooks.Close();
        }
    }
    finally
    {
        excel.Application.Quit();
    }
    return base.ReadConvertedFile();
}

ConverterLocator class 

To call the actual converter based on the extension of the file, we need some converter locator which can return the actual converter service.

Like if i upload the xls file, the ConverterLocator must return the instance of XlsToHtml class else if the upload files is document then ConverterLocator return the instance of DocToHtml class. 

Both XlsToHtml and DocToHtml class implements the IConverter interface, which declare the Convert method.

C#
public static IConverter Converter(string fullFilePath, string fileToSave)
{
    IConverter converter = null;
    string ext = fullFilePath.Split('.').Last().ToLower();
    switch (ext)
    {
        case "doc": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "docx": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "dot": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "dotx": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "rtf": converter = new DocToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "xls": converter = new XlsToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
        case "xlsx": converter = new XlsToHtml { FileToSave = fileToSave, FullFilePath = fullFilePath };
            break;
    }
    return converter;
}

Access the Converter functionality

We are ready with every thing, now we need to call the functionality to covert the document and excel to html and render the result on the browser screen.

Below is the snippt of code to call the IConverter service.

C#
private void ConvertAndLoadDocumentInEditor()
{
    //To save every file with different name
    string randamName = DateTime.Now.ToFileTime().ToString();

    string relativePath = Server.MapPath("~") + "/_Temp/";

    //Complete path of the file.
    string FilePath = relativePath + randamName + flDocument.FileName;

    string GeneratedName = randamName + 
      flDocument.FileName.Split('.')[flDocument.FileName.Split('.').Count() - 2] + ".html";

    flDocument.SaveAs(FilePath);

    //Converter functionality needs the file name to save as.
    string FileToSave = 
      HttpContext.Current.Server.MapPath("~") + "_Temp\\" + GeneratedName;

    //Get the instance of IConverter interface
    IConverter doc = ConverterLocator.Converter(FilePath, FileToSave);

    //Call the Converter class and set th test of editor to converted excel.
    editor.Text = doc.Convert().ToString().Replace("�", "");
}

For demo purpose I created a word document and converted it to html file using Microsoft Word Interop.  

Here is the word document file, that I created for demo.

Image 2

and here is the converted HTML and I am displaying the converted HTML in FCKeditor.

Image 3

Summary

So you have been walk through how to convert the Microsoft word document into HTML document and displaying the result in Browser. By using Interop API you can perform several type of works like generating document, Excel sheets on the fly using code. This demo just give the introduction of Microsoft interop API, you can perform much more complex thing.

Disclaimer 

The project is solely based on my self study, knowledge and research, not based on any other project. I have used Microsoft Office Interop Api  to write this article. I would like to tell you that this is not the best approach to run Microsoft office on web server, because it is not recommended by Microsoft. Instead they recommended Open XML to perform Microsoft Office related functionality on web server. With OpenXML you can do near about everything that you do with MS Word or MS Excel.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)