Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

DHTML Editor with Table Support and Source Editor

4.95/5 (85 votes)
14 Feb 2011CPOL15 min read 1   10K  
A complete class encapsulating a versatile HTML editor

DHTMLEditor/dhtmleditor.gif

Introduction

When you add the class CHtmlEditor to your project, the users of your software will be able to enter HTML content into an HTML Editor GUI control. Using this class, you get several weeks of coding and testing for free!

DHTMLEditor/classview.gif

Features

  • WYSIWYG Editor using the DHTML Editor of Internet Explorer
  • Edit mode / Browse mode / Source code editor
  • Support of tables (add and delete rows and columns, split cells, combine cells, edit table properties)
  • Default functionality as in Outlook Express (images, links, HR, undo, redo...) and more
  • Complex support of text formatting (font size can be set very precisely in pixels) and definition of default font
  • Enabling / disabling of XP desktop themes
  • Complex library of classes and functions which wrap the access to Internet Explorer's COM interface like IHTMLElement, IHTMLDocument2, etc
  • Programmatic access to style sheet definitions inside the HTML page
  • Complex clean-up function which removes invalid HTML tags
  • Automatic update of the state of the GUI elements (combo boxes, buttons) depending on the current cursor position
  • Custom context menus for Internet Explorer in editor and browse mode possible
  • Browsing to embedded HTML resources
  • Dialog-based GUI control
  • MFC project
  • All functionality is encapsulated in one class with multiple embedded classes, so you add only one new item to the class view of your project (see image above)
  • New in version 5.0: Generation of indented HTML 5 (W3C conform) HTML code

Quality

This project is very versatile and very thoroughly tested:

  • Tested on Visual Studio 6.0 - 2010
  • Tested on Internet Explorer 5.0 - 8.0
  • Tested on Windows 95 - Windows 7
  • Can be integrated into a UNICODE or Multibyte (MBCS) project
  • Even if compiled as Multibyte version, this project ALWAYS supports Greek, Russian, Japanese letters
  • Cleanly and thoroughly written code
  • Very well-documented code; if you are not one of those too lazy to read, you will understand
  • Tested by thousands of users worldwide, as this class is part of the program ElmüSoft Desktop Organizer

Requirements

To understand this project, you:

  • Need profound HTML and Style Sheet knowledge
  • Must be firm in C++ programming (deriving classes, etc)
  • Have to study the MSDN if you want to expand the existing functionality

Why Such a Comlpex Class?

You might ask, "With MFC7, Microsoft introduced the CHtmlEditView class. Isn't that all I need?"

No, definitely not!

CHtmlEditView is only the very basic base of all. It does not offer any functionality to work with tables. It does not allow you to modify the background color of the document or to modify its STYLE definitions. It does not offer a source editor and it does not allow you to directly access any HTML element inside the page, etc...

Tables

I don't know of any open source HTML editor that supports tables. So I wrote the stuff on my own, which is more complex than you might think,especially if you want to support adding new columns or splitting one cell into two, etc. To keep the work in a reasonable limit, this editor only supports cells that span over multiple columns. Cells spanning over multiple rows are not needed that often, and implementing this would still be a lot of additional work if you do it thoroughly!

Fonts

The Internet Explorer Editor only supports seven font sizes, as you can see in Outlook Express. The command IDM_FONTSIZE sets <FONT size=1...7>. I decided that this is too gross and found a way to let the user set unlimited and precise font sizes using <FONT style="font-size:17px"> (using px is much more precise than using pt).

Internet Explorer COM Interface Relationship

Internet Explorer offers a really huge bunch of COM interfaces, which allow the control of ANYthing. Your application can let Internet Explorer browse to any URL, access any HTML element in the document and read or modify its content. Everything you can do with JavaScript you can also do directly from your application via COM. Also, you cannot only do that if your application hosts the Internet Explorer control, but also if Internet Explorer runs as an independent application! However, remote automation is not a subject of this project.

There is a very long list of COM interfaces described in the MSDN. I will explain here only the basics. The following image tries to illustrate the relationship of the most important interfaces.

DHTMLEditor/objecthierarchy.gif

The IWebBrowser interface is the browser / editor itself. It contains, for example the command Navigate(), which browses to a URL. You can ask the IWebBrowser interface for the IHTMLDocument interface that represents all the visible content of the HTML document. If the document contains a frameset, then each frame again contains its own IHTMLDocument interface.

You can ask the IHTMLDocument interface for the IHTMLBodyElement which, for example, contains a command to set the background color (<BODY bgcolor=red>). The document can also retrieve ANY other IHTMLElement interface. For example, it can do this by using IHTMLDocument3.getElementById() or by using IHTMLDocument.get_all(), which retrieves a collection of ALL elements in the document that can later be filtered.

You will notice that some interfaces exist with a number. They represent the same interface and you can cast them into each other using CComQIPtr<...>. However, the commands they provide are different. The reason is that these interfaces are implemented at different times:

  • IHTMLDocument requires Internet Explorer 4.0
  • IHTMLDocument2 requires Internet Explorer 4.0
  • IHTMLDocument3 requires Internet Explorer 5.0
  • IHTMLDocument4 requires Internet Explorer 5.5
  • IHTMLDocument5 requires Internet Explorer 6.0

With CHtmlEditor::GetMsieVersion(), you can retrieve the version of Internet Explorer on the current machine. If you should need any of the newer functionality, you must check the MSIE version. Otherwise, the result is a crash of your application on older Internet Explorer versions! Currently the project does not use any commands that require more than Microsoft Internet Explorer 5.0.

CHtmlEditor Class Hierarchy

The following image illustrates the class hierarchy:

DHTMLEditor/classhierarchy.gif

The hierachy of classes in CHtmlEditor is equivalent to this tree. cHtmlDomNode is the base of all. It contains commands like those in the JavaScript DOM model: you can navigate from one element to its parent or sibling (cHtmlDomNode::NextSibling()); you can remove an element from the document (cHtmlDomNode::Remove()) or you can even create a new element.

Derived from cHtmlDomNode are cHtmlDocument and cHtmlElement. This means that these inherit the functionality of cHtmlDomNode. cHtmlElement allows, for example, retrieval of the inner HTML code of an element or the modification of its attributes. For example, cHtmlElement::SetAttribute("Align", "Right") would result in <SPAN align=right> when executed on a SPAN element.

Derived from cHtmlElement are the other elements like cHtmlTable, cHtmlImg, etc that again derive the functionality of cHtmlElement and add their own specialized functionality. For example, cHtmlTableCell::Split() would split a table cell into two cells.

Using CHtmlEditor

The fine thing is that, using CHtmlEditor, you don't have to care about all of the COM interfaces. They are all nicely wrapped in C++ classes. First of all, to integrate the editor into your project, simply place a static control into the dialog where you want the editor to be. The following code creates the Internet Explorer editor and the Richedit Source editor:

C++
pi_Editor = new CHtmlEditor();
BOOL b_OK = pi_Editor->CreateEditor(&mCtrl_Static, this, FALSE, TRUE);

Here are some examples that demonstrate the usage:

Example 1

You have a table <TABLE ID="MainTable"> inside the document and you want to set its border color:

C++
cHtmlTable i_Table = pi_Editor->GetDocument()->GetElementByID("MainTable");
if (i_Table.Valid()) i_Table.SetBorderColor("#5544FF");

Example 2

You have a table and want to set the background color of the third cell in the first row:

C++
cHtmlTableCell i_Cell = i_Table.GetCell(0, 2, TRUE);
if (i_Cell.Valid()) i_Cell.SetBgColor("blue");

Example 3

You want to set the color of all <HR> tags that exist in the document:

C++
CComQIPtr<IHTMLElementCollection> i_Collect;
UINT Count = pi_Editor->GetDocument()->GetElementCollection("HR", i_Collect);
for (UINT i=0; i<Count; i++)
{
   cHtmlHR i_HR = cHtmlElement::GetElementFromCollection(i, i_Collect);
   i_HR.SetProperty(E_Color, "#FF8800");
}

Working with Styles

CHtmlEditor allows access (read and write) to any style attribute of any HTML element in the document. It is as easy as writing:

C++
i_TableCell.GetStyle().SetProperty(E_FontSize, "18px");

...which would set the font size of a table cell to <TD style="font-size:18px">...</TD>. You can also modify the general style definitions for the whole document. The following sets <HEAD><STYLE> Body { FONT-SIZE: 18px; } </STYLE></HEAD>.

C++
pi_Editor->GetDocument()->GetStyleSheet().SetProperty("Body", E_FontSize, "18px");

Working with the Selection / Cursor Position

For example, the user has selected text and clicks a toolbar button to execute any action on this text. There are several default functions provided by Internet Explorer that work with the current selection. For example, if you call:

C++
pi_Editor->ExecSetCommand(IDM_FORECOLOR, "red")

...the foreground color of the selected text will be set to red. You can find all the IDM_XYZ commands in the file MsHtmcid.h of Visual Studio 7, but most of them are not implemented. It seems that Microsoft initially had many more plans with Internet Explorer than they finally realized.

But what if you want to implement your own not-yet-existing functionality? You can call cHtmlDocument::GetSelection(), which will return the HTML element containing the cursor or the selection.

Example 1

You want to retrieve the URL of the image which is currently selected by the user:

C++
cHtmlImg i_Img = pi_Editor->GetDocument()->GetSelection();
if (i_Img.Valid()) s_Url = i_Img.GetURL();

Example 2

You want to add a <SUB> tag around the text that is currently selected (<SUB>Text</SUB>):

C++
BOOL b_OK = pi_Editor->GetDocument()->AddToSelection("<SUB>", "</SUB>");

See source code comments for more info!

Visual Studio 6 versus 7

There are several reasons why I prefer working with Visual Studio 6 instead of upgrading to Visual Studio 7, but I don't want to explain this here. The problem is that MFC 6 does not yet support CHtmlEditView, the MFC wrapper for the HTML Editor. However, MFC 6 already supports CHtmlView, the Internet Explorer Browser.

If you look into the source code of CHtmlEditView (VisualStudio7\Vc7\AtlMfc\Include\AfxHtml.h), you will notice that it would be extremely awkward to convert all that stuff to make it run on Visual Studio 6. There are several classes required, like CHtmlEditCtrlBase, etc and you cannot simply take Microsoft's Visual Studio 7 code and put it into your Visual Studio 6 project. This is because there are several dependencies on classes which do not yet exist in Visual Studio 6 (CStringA, CStringW) or which have less functionality. You would end up rewriting all of it.

I found an easier way of expanding CHtmlView to get all the functionality I need by adding only a very few lines of code. For Visual Studio 6, it is required to #include some Visual Studio 7 header files and a *.Lib file, which you find in the folder Vs7 of the Visual Studio 6 project. This is the reason why the Visual Studio 6 project download is much bigger. The Visual Studio 7 project obviously uses CHtmlEditView and is much smaller, as all the includes are not required.

Why MFC?

If you are not familiar with MFC and if you are wondering what _T("Red") is good for or what the compiler options UNICODE and MBCS mean, I recommend the VERY good book "Professional MFC," which you can download for free from my homepage. There are some people who don't like MFC, mostly beginners who never understood it. However, this project is a very good example that demonstrates how MFC makes your life much easier! Let's say you have the HTML Editor and want to retrieve the title of the document:

<HTML><HEAD><TITLE>Title of document</TITLE></HEAD><BODY></BODY></HTML>

Version 1

Programming the COM interface of Internet Explorer without MFC would result in this code:

C++
GetDocTitle(IWebBrowser2* pWebBrowser, WCHAR *pu16Title)
{
   IDispatch      *pHtmlDocDisp;
   IHTMLDocument2 *pHtmlDoc;
   
   // get the IDispatch interface of the document
   HREULT hr = pWebBrowser->get_Document (&pHtmlDocDisp);
   if (SUCCEEDED (hr))
   {
      // Query interface for IHTMLDocument2
      hr = pHtmlDocDisp->QueryInterface (IID_IHTMLDocument2, (void**)&pHtmlDoc);
      if (SUCCEEDED (hr))
      {
         BSTR bsTitle;
         hr = pHtmlDoc->get_title (&bsTitle);
         if (SUCCEEDED (hr))
         {
            wcscpy(pu16Title, bsTitle);
            SysFreeString (bsTitle);
         }
         pHtmlDoc->Release();
      }
      pHtmlDocDisp->Release();
   }
}

Version 2

With MFC, the same code becomes much shorter and less error-prone:

C++
GetDocTitle(CHtmlEditView *pEditor, CString *ps_Title)
{
   CComQIPtr<IHTMLDocument2> i_Doc2 = pEditor->GetHtmlDocument();
   CComBSTR bs_Title;
   i_Doc2->get_title (&bsTitle);
   *ps_Title = bs_Title;
}

Version 3

Using the class CHtmlEditor of this project, it cannot be easier anymore:

C++
GetDocTitle(CHtmlEditor *p_Editor, CString *ps_Title)
{
   *ps_Title = p_Editor->GetDocument()->GetTitle();
}

Internet Explorer / MFC Bugs

After working for nearly 2 years with the Internet Explorer COM interface, I have to say that this is very good quality: it is free of bugs! This is very unusual for Microsoft products, but the only Internet Explorer bug I found is IHTMLDocument2.get_readyState. Do not use this command! This function worked fine until Microsoft destroyed it with Windows XP SP2. However, this does not matter, as you can easily replace it with CHtmlEditor::GetBusy().

There is also a bug in the MFC function CHtml(Edit)View::GetDocumentHTML() that uses CStreamOnCString, which is buggy. I wrote my own class, cStreamReader, which replaces the buggy function.

Security

Everybody knows that Internet Explorer is full of security holes. However, if you use it in your application just as an editor and do NOT browse with it to any malicious webpage, you don't have to worry about anything!

IMPORTANT: I recommend NOT allowing the user to switch into browse mode. This sample project allows everything.

If you don't want to be that strict, you can use the function CHtmlEditor::OnBeforeNavigate2() to forbid browsing to the internet. There you can put a filter which allows only files on the local hard disk and embedded resources. Each time the user clicks a link, he gets an error message.

<P> versus <DIV>

If you switch Internet Explorer into Design Mode, you will find that by default hitting the Enter key inserts TWO new lines. This is because Internet Explorer inserts a <P> tag. If you want a <BR> (a single new line), you have to press Shift + Enter.

This is quite stupid because you will have to tell all the users of your software to change their habits and to always hold the Shift key down when they want to go to the next line. There is no way to change this behaviour in Internet Explorer. However, there is a work-around. After loading a clean document, you have to insert an empty <DIV> tag like this: <BODY><DIV></DIV></BODY>. Now each Enter inserts ONE new line, which will look like this: <DIV>Text of line 1</DIV><DIV>Text of line 2</DIV>. Also, each empty table cell has to be filled: <TD><DIV></DIV></TD>. CHtmlEditor takes care to do all this automatically.

GUI

This sample project uses a very primitive GUI consisting of buttons and button-style check boxes to keep it simple. You will have to create your own nice toolbar with tooltips. The advantage of a toolbar over check boxes and buttons is that a toolbar cannot steal the focus from the HTML editor when the user clicks a toolbar button. The result could look like this in my program ElmüSoft Desktop Organizer:

DHTMLEditor/ptbsync.png

I used the top row of toolbar buttons for table editing, the bottom row for text editing and the middle row for all the rest of the functionality.

Automatic GUI Update

Every time you move the cursor in the HTML document, CHtmlEditor posts a notification to its parent so the GUI can be updated. This means that if the cursor moved from a bold text with font size 11 to an underlined text with font size 15, the combo boxes and toolbars are updated. In MFC7, CHtmlEditor gets this event from CHtmlEditView::OnUpdateUI(), but this is not yet available in MFC6. However, after studying the MFC source code, I found that this event is triggered by a timer. I subclass the "Internet Explorer_Server" window, catch this timer and so get the event also using MFC6.

After CHtmlEditor receives this event, it posts a message to its parent. I use WM_IDLEUPDATECMDUI for that, but you can use any other message that is not currently used for other purposes.

MSIE Context Menu

Internet Explorer displays a context menu when you right click into it. This is different in Browse mode and in Design mode. Here on The Code Project, you can find an article about how to modify or turn off this context menu, but this way is extremely complicated.

As I wrote above, I subclass the "Internet Explorer_Server" window and there I catch WM_CONTEXTMENU. There you can easily implement your own context menu via TrackPopupMenuEx() or turn it completely off.

Keyboard Shortcuts

In the same class (CMsieWnd), you can also adapt the way Internet Explorer reacts to keyboard shortcuts. You can either modify the default behaviour (e.g. CTRL-P for printing, CTRL-N to open a new window) or add your own additional shortcuts (e.g. CTRL-R to insert a new table row).

The Cleanup Function

DHTMLEditor/desktopnotes.gif

CHtmlEditor contains a very complex HTML cleanup functionality. Why is this? As I already wrote, I made this editor for my program ElmüSoft Desktop Organizer. There the user can enter any HTML content which will later be displayed on the desktop in a note. These desktop notes are displayed by an Internet Explorer Window that is positioned invisibly over the desktop, as you see above.

As this desktop organizer uses Java Script to move and open/close these notes, I cannot allow the user to enter his own JavaScript that might disturb the correct functioning. So there is a cleanup function in CHtmlEditor that removes any <SCRIPT>, <IFRAME>, <OBJECT>, etc blocks the user might have entered.

Additionally there are HTML tags which are not supported by the Internet Explorer editor. For example, <CENTER> centers a whole block of HTML code. However, the editor normally centers only line-by-line by using <DIV align=center>Text</DIV>. If you enter a <CENTER> tag, the editor will display the HTML content correctly, but the GUI button for the centered text will not be pressed and your users will be wondering. There are ONLY two places where illegal code may derive from:

  1. The user entered it in the source editor.
  2. The user copied it from a web page and pasted it into the HTML editor.

However, even if you do NOT paste or source edit, the cleanup function will remove useless empty tags like <u></u>. These may appear after typing, copying & pasting and deleting around for a while.

IMPORTANT: you will have to adapt the function cHtmlDocument::RecursiveCleanUpChilds() to your needs, but do NOT modify anything before you COMPLETELY understand how it works!!!

You could also completely prohibit source editing and pasting, but this is not a good idea. The user will become unable to copy HTML content that he has written on his own and paste it to another location in his document or duplicate it.

Finally

There are lots of other things I should explain here but, you will find all you need to know by studying the source code and its plentiful comments!

IMPORTANT: before you start, study the class DHtmlEditDemoDlg thoroughly to understand how to integrate CHtmlEditor into your project !!!!!!!!!

If you should find a bug, write me an email!

History

  • 9 November, 2005 -- Original version posted
  • 20 November, 2006 -- First update
  • 12 February, 2008 -- Second update
    • Version 4.2 released
    • Article content and downloads updated

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)