Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Upload and convert PowerPoint, Excel, Visio, Word to HTML on ASP.NET

0.00/5 (No votes)
28 Dec 2012 1  
Upload a Microsoft Office Power Point, Excel, Visio, or Word File to webserver, of which will be converted to HTML and hosted as a document link on the page in which it was uploaded.

Introduction

This OfficeHTMLConverter class library and OfficeToHTMLWeb ASP.NET web project converts Microsoft Office documents (PowerPoint, Excel, Visio, or Word) into HTML on a Webserver. Any ASP.NET page bound to the supplied master page will render a File Upload section in the footer allowing Microsoft Office documents to be uploaded to the page in the form of hyperlinks. The Office Documents themselves are converted to HTML (images, content, and file structure) and a link to the converted Office document is dynamically created on the page in which the Office Document was uploaded.

Prerequisites

Background

While addressing an intermittent web based Office Automation issue at work recently, I took some time to take a closer look at automation with newer versions of Microsoft Office.   As Microsoft warns, Office products are not recommended to be automated on a server. The Office products are optimized for running as client applications using a Desktop for interaction.

It has been a while since I have worked with Office automation and while this issue was still fresh in my mind, I decided to experiment with Office automation under ASP.NET in order to learn more about the reasons Microsoft recommends not using this in a shared hosted environment. Even if this kind of implementation is not recommended by Microsoft, I wanted to see how Office Automation performs under load. It has been some time since I have worked with Office Automation and I was impressed with the “SaveAsWeb” functionality supported across most Office applications. Within minutes I had an Office helper class ready (OfficeHTMLConverter) and a simple test bed (OfficeToHTMLWeb). I decided to modify my master page to allow Office Documents to be uploaded and converted on any ASP.NET form utilizing this master page. Additionally, I decided to add a Page Trending feature that would keep count of which pages were accessed the most.

Explanation

The OfficeToHTMLWeb helper assembly contains two classes, one (Converter) of which handles PowerPoint, Excel, Visio, and Word HTML conversions and the other (UploadedFileInfo) of which assists in tracking uploaded file names as their relative web paths. The OfficeToHTMLWeb project   provides ASP.NET sample web pages utilizing the OfficeToHTMLWeb assembly in the master page. The master page renders a section in the footer of a page of which allows file upload of Office Documents and any previously uploaded and HTML converted document link. Hovering over an existing upload link will display a JavaScript preview of the uploaded and HTML converted document.

The OfficeHTMLConverter.Converter class is the heavy lifter. This assembly contains references to the Office Interop assemblies.

...
using Word = Microsoft.Office.Interop.Word;
using Visio = Microsoft.Office.Interop.Visio;
using VisioWeb = Microsoft.Office.Interop.Visio.SaveAsWeb;
using PowerPoint = Microsoft.Office.Interop.PowerPoint;
using Excel = Microsoft.Office.Interop.Excel;
using Microsoft.Office.Interop;
using Microsoft.Office.Core;
...

The Converter class provides a number of static List functions that return a collection of FileUploadInfo objects containing the file name and relative web path of any PowerPoint, Excel, Visio, and Word Office document type being uploaded. The sourceFilePath parameter is the literal path to the original Office document of which has been uploaded, and the targetDirectory parameter is the literal web path in which the HTML SaveAs conversion of this document should take place.

public static List<UploadedFileInfo> ToHTML(string sourceFilePath, string targetDirectory)

The ToHTML() function will determine the document type being handled and forward the request to the appropriate static function based on the extension of the file being uploaded.

switch (fileInfo.Extension.ToLower())
{
  case ".doc":
  case ".docx":
     result = WordToHTML(refSourceFilePath, refTargetDirectory);
     break;
  case ".xls":
  case ".xlsx":
     result = ExcelToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
     break;
  case ".ppt":
  case ".pptx":
     result = PowerPointToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
     break;
  case ".vsd":
  case ".vsdx":
     result = VisioToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
     break;
                
  default:
     throw new Exception(fileInfo.Extension.ToLower() + " files are not supported");
}

Each of the four Office document types handles HTML conversion of their respective document types differently, requiring distinct converter functions. Using the WordToHTML() static function as reference, we can see an Word application instance is created as well as an Interop Word document collection interface.

...
Word._Application word = new Word.Application();
Word.Documents documents = word.Documents;
...

The Word document is opened via the OpenNoRepairDialog() function and care is taken to mute or suppress any dialogs or user input as this will be executed on a web server.

...
string fileName = Path.GetFileName(refSourceFileName.ToString());
Word.Document document = documents.OpenNoRepairDialog(ref refSourceFileName, ref refTrue,
  ref refFalse, ref refFalse, ref refMissing,
  ref refMissing, ref refMissing, ref refMissing,
  ref refMissing, ref refMissing, ref refMissing,
  ref refTrue, ref refFalse, ref refMissing,
  ref refMissing, ref refMissing);
...

The actual HTML conversion is handled by the Word automation object via its SaveAs() function.

...
object newRefTargetDirectory = Path.Combine(refTargetDirectory.ToString(), fileName + ".htm");
word.ActiveDocument.SaveAs(ref newRefTargetDirectory, ref wdFormatHTML,
  ref refMissing, ref refMissing, ref refMissing,
  ref refMissing, ref refMissing, ref refMissing,
  ref refMissing, ref refMissing, ref refMissing,
  ref refMissing, ref refMissing, ref refMissing,
  ref refMissing, ref refMissing);
...

Microsoft defines the SaveAs() parameters as follows:

void SaveAs
(
        [In, Optional] ref object FileName, 
        [In, Optional] ref object FileFormat, 
        [In, Optional] ref object LockComments, 
        [In, Optional] ref object Password, 
        [In, Optional] ref object AddToRecentFiles, 
        [In, Optional] ref object WritePassword, 
        [In, Optional] ref object ReadOnlyRecommended, 
        [In, Optional] ref object EmbedTrueTypeFonts, 
        [In, Optional] ref object SaveNativePictureFormat, 
        [In, Optional] ref object SaveFormsData, 
        [In, Optional] ref object SaveAsAOCELetter, 
        [In, Optional] ref object Encoding, 
        [In, Optional] ref object InsertLineBreaks, 
        [In, Optional] ref object AllowSubstitutions, 
        [In, Optional] ref object LineEnding, 
        [In, Optional] ref object AddBiDiMarks
);

The Site.Master.cs class handles the actual file uploading via the ImportFile function. This function is called whenever the Upload button is clicked and the System.Web.UI.WebControls.FileUpload control as data present.

public void ImportFile(string fileName, byte[] fileData)

The ImportFile() function determines which page is responsible for the upload and determines where to store the file. If not already present, the ImportFile() function will create the target directory. The file is then open and read into a FileStream object.

...
//  Save the raw file to local webserver storage
string pageName = Path.GetFileNameWithoutExtension(Page.AppRelativeVirtualPath);
string rawDirectory = Path.Combine(ConfigurationManager.AppSettings["TargetDirectory"], pageName);
string rawFilePath = Path.Combine(rawDirectory, fileName);

//  Ensure the local storage directory path exists
Directory.CreateDirectory(rawDirectory);

//  Write the file byte array to local storage as the original file name
FileStream stream = new FileStream(rawFilePath, FileMode.Create);
...

Once opened, ImportFile() will ensure the creation of the directory of which will be used to hold the HTML conversion of the file being uploaded. The OfficeHTMLConverter.Converter.ToHTML() static function is enlisted to handle the conversion process.

...
//  Save the raw file as HTML to the web directory
List<UploadedFileInfo> results = OfficeHTMLConverter.Converter.ToHTML(rawFilePath, webDirectory);
...

At this point, the Office document has been converted and stored as HTML. What remains is storing information about this new web directory: where it is, when it was created, the page name in which to display the web path link, and who created the upload. Rather than using a database, I have opted to use a local XML Dataset. This XML file is stored in the /Resources/OfficeUploads.xml file and is managed by a helper assembly named BusinessObjectFramework.DLL of which I wrote a few years back. This DLL is included with the OfficeToHTMLWeb ASP.NET web project under the /Resources/bin project folder.

...
//  Read in the page visits data file
DataSet dataSet = DataAccess.ReadXMLDataSet(resources + @"\Resources\OfficeUploads.xml");
...

When a page bound to the master page is rendered, the DisplayUploadDocuments() function is called on the page OnLoad() function. DisplayUploadDocuments() collects the page name of the page being rendered and determines if any converted uploads exist for this page in the XML serialized dataset /Resources/OfficeUploads.xml. Previous uploads are rendered as HTML anchors with previewURL and closePreview JavaScript functions connected to its onMouseOver and onMouseOut events, respectively. JavaScript function viewLink() is called whenever a converted document link is clicked.

...
//  Read database and display file links for this page
string pageName = Path.GetFileNameWithoutExtension(Page.AppRelativeVirtualPath);
//  Update OfficeUploads
string resources = Server.MapPath("~");
if (File.Exists(resources + @"\Resources\OfficeUploads.xml"))
{
  //  Read in the page visits data file
  DataSet dataSet = DataAccess.ReadXMLDataSet(resources + @"\Resources\OfficeUploads.xml");
...

Conclusion

As you see, this is a relatively simple implementation of Office automation under ASP.NET. I have not spent a lot of time properly handling exceptions as I was more interested in getting an idea of how current Office automation fairs against previous versions. Remember, Microsoft does not recommend using Office automation in a share web server environment and I encountered almost all of the issues discussed in the following Microsoft support link.

History

·         December 28th, 2012: Initial version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here