Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Convert HTML to Word Document using CKEditor and MariGold.OpenXHTML

5.00/5 (3 votes)
29 Sep 2017CPOL3 min read 33.7K   621  
Implement an online HTML to Word converter using CKEditor and MariGold.OpenXHTML

Introduction

MariGold.OpenXHTML is a GitHub open source library to convert HTML documents into Word documents. It internally uses Open XML SDK to create Word documents. The CKEditor is a popular free tool for formatting the HTML in web sites. By integrating these together, we can develop an online HTML to Word converter. We will create an ASP.NET MVC project to demonstrate this.

Using the Code

This tutorial uses Visual Studio 2015 community edition. The first part of this tutorial will explain how to integrate CKEditor in MVC project and the second part will discuss about the conversion of HTML to a Word document from the output of CKEditor.

Setup the CKEditor

Download your preferred package from the CKEditor web site. This tutorial will use the full package which contains all the plugins to experiment with. Open Visual Studio and create a new MVC project with default templates. We can re-use the Home controller and Index cshtml for our demo purposes.

Image 1

Extract the downloaded CKEditor package and copy the entire ckeditor folder into the Scripts folder.

Image 2

Remove all the HTML contents from Index.cshtml and add the following code:

HTML
@using (Html.BeginForm("Index", "Home", FormMethod.Post))
{
    @Html.TextArea("content", new { @id = "editor1" })
    <input type="submit" value="Submit" />
}

Of course, we need to include the reference of ckeditor.js and a script element at the bottom of the same page to initialize the CKEditor.

Java
<script type="text/javascript" src="~/Scripts/ckeditor/ckeditor.js"></script>
<script>
    CKEDITOR.replace('editor1');
</script>

CKEditor is now fully configured and if you run the application, it will load on the home page. The next step is to install the MariGold.OpenXHTML and implement an Index post action method on Home controller to submit the HTML content.

Setup the MariGold.OpenXHTML

This library is available as a NugGet package. To install, enter the following command on package manager console.

Install-Package MariGold.OpenXHTML

This will also install the following dependencies:

  • DocumentFormat.OpenXml - OpenXml SDK library to create Open XML word documents
  • MariGold.HtmlParser - To parse and extract the HTML elements from the input text

The final step is to integrate all these to create the Word documents on the fly. Add a new Index method as below on Home controller to post the HTML from CKEditor. Don’t forget to include the necessary namespaces.

C#
using System.Web.Mvc;
using System.IO;
using MariGold.OpenXHTML;
C#
[HttpPost]
[ValidateInput(false)]
public FileResult Index(string content)
{
    using (MemoryStream mem = new MemoryStream())
    {
        WordDocument doc = new WordDocument(mem);
        doc.Process(new HtmlParser(content));
        doc.Save();

        return File(mem.ToArray(), "application/msword", "sample.docx");
    }
}

Most of the work is done in the WordDocument class. This class contains few properties and methods to manipulate the process of converting HTML into Open XML word document. Refer to the GitHub project home page for more details.

Here, we will be using a MemoryStream to create the Word document in-memory. The Process method is responsible for parsing the HTML and convert it into Word document. This method requires an IParser type implementation for parsing the HTML text. This will help to completely replace default HTML parsing implementation with any other custom implementation. Refer to the GitHub project home page on how to implement this.

The Save method is required to flush all the modifications into the MemoryStream. The last line of code will write the content of MemoryStream as a binary array into the FileContentResult. This will force the browser to download the output file.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)