Introduction
This OfficeHTMLConverter class
library and OfficeToHTMLWeb ASP.NET
web project converts Microsoft Office documents (PowerPoint, Excel, Visio, or
Word) into HTML on a Webserver. Any ASP.NET page bound to the supplied master
page will render a File Upload section in the footer allowing Microsoft Office
documents to be uploaded to the page in the form of hyperlinks. The Office Documents themselves are converted
to HTML (images, content, and file structure) and a link to the converted
Office document is dynamically created on the page in which the Office Document
was uploaded.
Prerequisites
Background
While addressing an intermittent web based Office Automation issue at work
recently, I took some time to take a closer look at automation with newer
versions of Microsoft Office. As
Microsoft warns, Office
products are not recommended to be automated on a server. The Office products are optimized for running as
client applications using a Desktop for interaction.
It has been a while since I have worked with Office automation and while
this issue was still fresh in my mind, I decided to experiment with Office
automation under ASP.NET in order to learn more about the reasons Microsoft
recommends not using this in a shared hosted environment. Even if this kind of implementation is not
recommended by Microsoft, I wanted to see how Office Automation performs under
load. It has been some time since I have
worked with Office Automation and I was impressed with the “SaveAsWeb”
functionality supported across most Office applications. Within minutes I had an Office helper class
ready (OfficeHTMLConverter
) and a
simple test bed (OfficeToHTMLWeb
). I decided to modify my master page to allow
Office Documents to be uploaded and converted on any ASP.NET form utilizing
this master page. Additionally, I decided to add a Page Trending feature that would keep count of which pages were
accessed the most.
Explanation
The OfficeToHTMLWeb helper
assembly contains two classes, one (Converter) of which handles PowerPoint,
Excel, Visio, and Word HTML conversions and the other (UploadedFileInfo
) of
which assists in tracking uploaded file names as their relative web paths. The OfficeToHTMLWeb project provides ASP.NET sample web pages utilizing
the OfficeToHTMLWeb assembly in the master page. The master page renders a section in the
footer of a page of which allows file upload of Office Documents and any
previously uploaded and HTML converted document link. Hovering over an existing upload link will
display a JavaScript preview of the uploaded and HTML converted document.
The OfficeHTMLConverter.Converter
class is the heavy lifter. This assembly contains references to the
Office Interop assemblies.
...
using Word = Microsoft.Office.Interop.Word;
using Visio = Microsoft.Office.Interop.Visio;
using VisioWeb = Microsoft.Office.Interop.Visio.SaveAsWeb;
using PowerPoint = Microsoft.Office.Interop.PowerPoint;
using Excel = Microsoft.Office.Interop.Excel;
using Microsoft.Office.Interop;
using Microsoft.Office.Core;
...
The Converter
class provides a number of static List functions that return a
collection of FileUploadInfo
objects containing the file name and relative web
path of any PowerPoint, Excel, Visio, and Word Office document type being uploaded.
The sourceFilePath
parameter is the literal path to the original Office document
of which has been uploaded, and the targetDirectory
parameter is
the literal web path in which the HTML SaveAs
conversion of this document
should take place.
public static List<UploadedFileInfo> ToHTML(string sourceFilePath, string targetDirectory)
The ToHTML()
function will determine the document type being handled and
forward the request to the appropriate static function based on the extension
of the file being uploaded.
switch (fileInfo.Extension.ToLower())
{
case ".doc":
case ".docx":
result = WordToHTML(refSourceFilePath, refTargetDirectory);
break;
case ".xls":
case ".xlsx":
result = ExcelToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
break;
case ".ppt":
case ".pptx":
result = PowerPointToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
break;
case ".vsd":
case ".vsdx":
result = VisioToHTML(refSourceFilePath.ToString(), refTargetDirectory.ToString());
break;
default:
throw new Exception(fileInfo.Extension.ToLower() + " files are not supported");
}
Each of the four Office document types
handles HTML conversion of their respective document types differently,
requiring distinct converter functions. Using the WordToHTML()
static function as reference, we can see an Word
application instance is created as well as an Interop Word document collection
interface.
...
Word._Application word = new Word.Application();
Word.Documents documents = word.Documents;
...
The Word document is opened via the
OpenNoRepairDialog()
function and care is taken to mute or suppress any dialogs
or user input as this will be executed on a web server.
...
string fileName = Path.GetFileName(refSourceFileName.ToString());
Word.Document document = documents.OpenNoRepairDialog(ref refSourceFileName, ref refTrue,
ref refFalse, ref refFalse, ref refMissing,
ref refMissing, ref refMissing, ref refMissing,
ref refMissing, ref refMissing, ref refMissing,
ref refTrue, ref refFalse, ref refMissing,
ref refMissing, ref refMissing);
...
The actual HTML conversion is handled by
the Word automation object via its SaveAs()
function.
...
object newRefTargetDirectory = Path.Combine(refTargetDirectory.ToString(), fileName + ".htm");
word.ActiveDocument.SaveAs(ref newRefTargetDirectory, ref wdFormatHTML,
ref refMissing, ref refMissing, ref refMissing,
ref refMissing, ref refMissing, ref refMissing,
ref refMissing, ref refMissing, ref refMissing,
ref refMissing, ref refMissing, ref refMissing,
ref refMissing, ref refMissing);
...
Microsoft defines the SaveAs()
parameters as follows:
void SaveAs
(
[In, Optional] ref object FileName,
[In, Optional] ref object FileFormat,
[In, Optional] ref object LockComments,
[In, Optional] ref object Password,
[In, Optional] ref object AddToRecentFiles,
[In, Optional] ref object WritePassword,
[In, Optional] ref object ReadOnlyRecommended,
[In, Optional] ref object EmbedTrueTypeFonts,
[In, Optional] ref object SaveNativePictureFormat,
[In, Optional] ref object SaveFormsData,
[In, Optional] ref object SaveAsAOCELetter,
[In, Optional] ref object Encoding,
[In, Optional] ref object InsertLineBreaks,
[In, Optional] ref object AllowSubstitutions,
[In, Optional] ref object LineEnding,
[In, Optional] ref object AddBiDiMarks
);
The Site.Master.cs class handles the actual file uploading via the
ImportFile function. This function is
called whenever the Upload button is clicked and the
System.Web.UI.WebControls.FileUpload
control
as data present.
public void ImportFile(string fileName, byte[] fileData)
The ImportFile()
function determines which page is responsible for the
upload and determines where to store the file. If not already present, the ImportFile()
function will create the target
directory. The file is then open and
read into a FileStream
object.
...
string pageName = Path.GetFileNameWithoutExtension(Page.AppRelativeVirtualPath);
string rawDirectory = Path.Combine(ConfigurationManager.AppSettings["TargetDirectory"], pageName);
string rawFilePath = Path.Combine(rawDirectory, fileName);
Directory.CreateDirectory(rawDirectory);
FileStream stream = new FileStream(rawFilePath, FileMode.Create);
...
Once opened, ImportFile()
will ensure the creation of the directory of which
will be used to hold the HTML conversion of the file being uploaded. The OfficeHTMLConverter.Converter.ToHTML()
static function is enlisted to handle the conversion process.
...
List<UploadedFileInfo> results = OfficeHTMLConverter.Converter.ToHTML(rawFilePath, webDirectory);
...
At this point, the Office document has been converted and stored as
HTML. What remains is storing
information about this new web directory: where it is, when it was created, the page name in which to display the
web path link, and who created the upload. Rather than using a database, I have opted to use a local XML
Dataset. This XML file is stored in the
/Resources/OfficeUploads.xml file and is managed by a helper assembly named
BusinessObjectFramework.DLL of which I wrote a few years back. This DLL is included with the OfficeToHTMLWeb
ASP.NET web project under the /Resources/bin project folder.
...
DataSet dataSet = DataAccess.ReadXMLDataSet(resources + @"\Resources\OfficeUploads.xml");
...
When a page bound to the master page is rendered, the
DisplayUploadDocuments()
function is called on the page OnLoad()
function.
DisplayUploadDocuments()
collects the page
name of the page being rendered and determines if any converted uploads exist
for this page in the XML serialized dataset /Resources/OfficeUploads.xml. Previous uploads are rendered as HTML anchors
with previewURL
and closePreview
JavaScript functions connected to its
onMouseOver
and onMouseOut
events, respectively.
JavaScript function viewLink(
) is called whenever a converted document
link is clicked.
...
string pageName = Path.GetFileNameWithoutExtension(Page.AppRelativeVirtualPath);
string resources = Server.MapPath("~");
if (File.Exists(resources + @"\Resources\OfficeUploads.xml"))
{
DataSet dataSet = DataAccess.ReadXMLDataSet(resources + @"\Resources\OfficeUploads.xml");
...
Conclusion
As you see, this is a relatively simple implementation of Office automation
under ASP.NET. I have not spent a lot of
time properly handling exceptions as I was more interested in getting an idea
of how current Office automation fairs against previous versions. Remember, Microsoft does not recommend using
Office automation in a share web server environment and I encountered almost
all of the issues discussed in the following Microsoft support link.
History
·
December 28th, 2012: Initial version.