Introduction
MariGold.OpenXHTML
is a GitHub open source library to convert HTML documents into Word documents. It internally uses Open XML SDK to create Word documents. The CKEditor
is a popular free tool for formatting the HTML in web sites. By integrating these together, we can develop an online HTML to Word converter. We will create an ASP.NET MVC project to demonstrate this.
Using the Code
This tutorial uses Visual Studio 2015 community edition. The first part of this tutorial will explain how to integrate CKEditor
in MVC project and the second part will discuss about the conversion of HTML to a Word document from the output of CKEditor
.
Setup the CKEditor
Download your preferred package from the CKEditor web site. This tutorial will use the full package which contains all the plugins to experiment with. Open Visual Studio and create a new MVC project with default templates. We can re-use the Home
controller and Index
cshtml for our demo purposes.
Extract the downloaded CKEditor
package and copy the entire ckeditor folder into the Scripts folder.
Remove all the HTML contents from Index.cshtml and add the following code:
@using (Html.BeginForm("Index", "Home", FormMethod.Post))
{
@Html.TextArea("content", new { @id = "editor1" })
<input type="submit" value="Submit" />
}
Of course, we need to include the reference of ckeditor.js and a script
element at the bottom of the same page to initialize the CKEditor
.
<script type="text/javascript" src="~/Scripts/ckeditor/ckeditor.js"></script>
<script>
CKEDITOR.replace('editor1');
</script>
CKEditor
is now fully configured and if you run the application, it will load on the home page. The next step is to install the MariGold.OpenXHTML
and implement an Index post action method on Home controller to submit the HTML content.
Setup the MariGold.OpenXHTML
This library is available as a NugGet package. To install, enter the following command on package manager console.
Install-Package MariGold.OpenXHTML
This will also install the following dependencies:
DocumentFormat.OpenXml
- OpenXml
SDK library to create Open XML word documents MariGold.HtmlParser
- To parse and extract the HTML elements from the input text
The final step is to integrate all these to create the Word documents on the fly. Add a new Index
method as below on Home controller to post the HTML from CKEditor. Don’t forget to include the necessary namespace
s.
using System.Web.Mvc;
using System.IO;
using MariGold.OpenXHTML;
[HttpPost]
[ValidateInput(false)]
public FileResult Index(string content)
{
using (MemoryStream mem = new MemoryStream())
{
WordDocument doc = new WordDocument(mem);
doc.Process(new HtmlParser(content));
doc.Save();
return File(mem.ToArray(), "application/msword", "sample.docx");
}
}
Most of the work is done in the WordDocument
class. This class contains few properties and methods to manipulate the process of converting HTML into Open XML word document. Refer to the GitHub project home page for more details.
Here, we will be using a MemoryStream
to create the Word document in-memory. The Process
method is responsible for parsing the HTML and convert it into Word document. This method requires an IParser
type implementation for parsing the HTML text. This will help to completely replace default HTML parsing implementation with any other custom implementation. Refer to the GitHub project home page on how to implement this.
The Save
method is required to flush all the modifications into the MemoryStream
. The last line of code will write the content of MemoryStream
as a binary array into the FileContentResult
. This will force the browser to download the output file.