Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Fast HTML syntax highlighting with the Rich Edit control

0.00/5 (No votes)
27 Mar 2006 1  
An article on fast HTML syntax highlighting using a rich edit control (CRichEditCtrl).

Introduction

This control is an extension of CRichEditCtrl, based on Derek Lakin�s CHtmlRichEditCtrlSSL, written to provide fast syntax highlighting for HTML code. Apparently, CHtmlRichEditCtrlSSL is too slow to be used with large files. Since my files are large, I decided to pick up the glove and develop a fast and scalable syntax highlighting control. The control can be used in two different ways:

  1. Loading and displaying a bunch of HTML code (from file, clipboard, sending WM_SENDTEXT), and
  2. Online parsing as you type.

Rules

The control follows a few simple rules to perform its syntax highlighting, as follows:

  1. Anything starting with '<!--' and ending with '-->' is a comment.
  2. Anything between two double quotes ("") is quoted text.
  3. Anything from '<' to '>' (except comments) is a tag.
  4. Anything else is normal text.

Strategy

In order to provide fast syntax highlighting, the control minimizes the coloring requests (calls to SetSel/SetSelectionCharFormat pair) by imposing a �clipping region� which is the visible lines range. The control also maintains a byte-per-line vector, holding a thin per-line state (colored/not colored, and the color state of last char on line) for quick coloring. When a line with already-colored state becomes visible � no need to do anything. When a not-colored line becomes visible � the line can be easily colored using the �color state� of the first char on line. A word about this �color state�: we go over the HTML line by line and calculate the color state (comment/quoted/tag/normal) of the last char (which is, of course, the state of the first char on the next line). This calculation is quite fast as we do it in one pass over the HTML without any coloring.

Updating colors as you type

It�s more complicated when �online� parsing and coloring is involved. Usually, as long as the user types �regular chars� (chars that do not change any color states), no special action is taken, as the CRichEditCtrl maintains itself the current CHARFORMAT.

There are two exceptions to that:

  1. The user places the caret at the beginning of the line. We need to explicitly set the correct color, or otherwise CRichEditCtrl would use the color of the next char. For example, when the user types <a>, then moves the caret back to start and hits a key � this char would be colored as a tag instead of as normal text.
  2. The user types a key which completes/breaks a comment start/end combination. We need to invalidate up to three preceding chars. For example, when the user hits a key inside <!--, he might break a comment start combination.

When the user types �interesting chars� (chars that do change the color state such as �<� or ���), the colors of all chars starting from the current caret position are invalidated and the color states are recalculated (for optimization, if a line with unmodified color state is reached - the calculation is stopped as the next lines stay valid). �!� and �-� are also included in this category as they affect the comment start/end combinations thus invalidating up to three preceding chars.

Note that invalidation is always done towards the next line while the preceding lines are never affected!

These are all done in the WM_CHAR handler.

When the user deletes text (VK_DELETE, VK_BACK, or pasting text over a selection), we want to inspect the text before it is deleted. This way, we can look for �interesting chars� inside the deleted text and invalidate the lines accordingly. We also take care of cases where deleting text completes/breaks the comment start/end combinations.

Text deletion and pasting is handled by WM_KEYDOWN and by EN_CHANGE handlers, as we want to access the text before it is deleted.

Note that for EN_CHANGE to be used, we must set the event mask bit ENM_CHANGE.

Scrolling

When the view area is changed (using the scroll bar, or by pressing Home, End, Page Up, Page Down, UP arrow, or DOWN arrow keys), the visible lines range changes. The control handles the EN_VSCROLL and WM_VSCROLL events to color the lines within the visible range that are not already colored (the EN_VSCROLL notification handles everything except for the case of clicking the scroll bar mouse itself, which is handled by the WM_VSCROLL notification). Note that for EN_VSCROLL to be used, we must set the event mask bit ENM_SCROLL.

Calculating the visible line range

The visible line range is the range between and including the first visible line and the last visible line. Easy, right? Well, not exactly. The CRichEditCtrl is kind enough to tell you its first visible line using EM_GETFIRSTVISIBLELINE, but won�t tell you its last visible line, as there�s no such thing as EM_GETLASTVISIBLELINE. I found a workaround on the net using three other messages: EM_GETRECT, EM_CHARFROMPOS, and EM_EXLINEFROMCHAR. The basic idea is to get the formatting rectangle of the CRichEditCtrl, then retrieve information about the character closest to a specified point in the client area of an edit control and determine its line:

int CFastHtmlRichEditCtrl::GetLastVisibleLine()
{
     // The EM_GETRECT message retrieves 

     //the formatting rectangle of an edit control:

     RECT rfFormattingRect = {0};
     GetRect(&rfFormattingRect);
     rfFormattingRect.left++;
     rfFormattingRect.bottom -= 2;

     // The EM_CHARFROMPOS message retrieves

     // information about the character

     // closest to a specified point

     // in the client area of an edit control

     int nCharIndex =  
        CharFromPos(CPoint(rfFormattingRect.left, 
        rfFormattingRect.bottom));

     //The EM_EXLINEFROMCHAR message determines which

     //line contains the specified character in a rich edit control

     return LineFromChar(nCharIndex);
}

The demo program

The demo program is an MFC dialog box which, based on a radio button's selection, displays either CFastHtmlRichEditCtrl or CHtmlRichEditCtrlSSL. The application is used for comparing performances between the two, by calling the ParseAllLines API and displaying the in call duration. As a benchmark, I�m using a random XML file of size 164480, and 3838 lines, named ADO.XML.

Run the program, and click either �Refresh� to parse and color the control�s window text, or the ellipsis to load, parse and color a file (you can use my ADO.XML if you like).

The results I�m getting on my machine for ADO.XML are:

  • CFastHtmlRichEditCtrl: 2 seconds
  • CHtmlRichEditCtrlSSL: 169 seconds

Quite a difference, ha?

Scrolling problems

Coloring a range of lines might change the visible line range, as selection of words from first and last lines scrolls the lines to be fully visible. I�m overcoming this by calling LineScroll after the coloring.

When the vertical scrollbar is visible, not in focus, and the user drags its thumb � the thumb jumps to the edge and back, thus flickers. Setting the focus to the CRichEditCtrl just before handling the OnVScrollevent seems to solve this problem. However, the horizontal scrollbar might move right and left while scrolling, when the caret isn�t positioned at the beginning of line. This is probably because when the caret isn�t positioned at the beginning, its position differs from line to line. However, I can live with such a flaw.

Another problem relates to a feature I tried to implement: background coloring.

Background coloring

To be even faster, I added a feature named �background coloring�. The background coloring is implemented using a timer which runs periodically for a short duration, and colors a few lines until all lines have been colored. However, the horizontal scrolling flaw described above while dragging the vertical thumb also happens here without any user activity, which is very annoying! Caching the horizontal thumb position, coloring the lines and restoring it wouldn�t help. However, when the caret is positioned at the beginning of a line and no selection has been made � the horizontal scroll doesn�t flicker (note that this is true for the class �RichEdit20W�. �RICHEDIT� class still flickers).

To conclude this issue, my OnTimer handler applies background coloring when:

  • the control has the focus (prevents vertical scrollbar flickers) and the caret is positioned at the beginning of the line and no selection has been made, or
  • the control window is completely obscured (in which there are no restrictions at all).

Note about Unicode

Rich Edit 2.0 has both ANSI and Unicode window classes�RichEdit20A and RichEdit20W, respectively. According to MSDN, you can specify the RICHEDIT_CLASS constant which expands to the correct class depending on the definition of the UNICODE compile flag. However, in the demo, I coded the #ifdef _UNICODE myself.

Class interface (based on CHtmlRichEditCtrlSSL)

// Construction/Destruction

public:
    // Default constructor

    CHtmlRichEditCtrlSSL();
    // Default destructor

    virtual ~CHtmlRichEditCtrlSSL();

public:
// Character format functions

    // Sets the character format to be used for Tags

    void SetTagCharFormat(int nFontHeight = 8, 
        COLORREF clrFontColour = RGB(128, 0, 0), 
        CString strFontFace = _T("Courier New"),
        bool bParse = true);
    // Sets  the character format to be used for Tags

    void SetTagCharFormat(CHARFORMAT& cfTags, bool bParse = true);
    // Sets the character format to be used for Quoted text

    void SetQuoteCharFormat(int nFontHeight = 8, 
        COLORREF clrFontColour = RGB(0, 128, 128), 
        CString strFontFace = _T("Courier New"),
        bool bParse = true);
    // Sets  the character format to be used for Quoted text

    void SetQuoteCharFormat(CHARFORMAT& cfQuoted, bool bParse = true);
    // Sets the character format to be used for Comments

    void SetCommentCharFormat(int nFontHeight = 8, 
        COLORREF clrFontColour = RGB(0, 128, 0), 
        CString strFontFace = _T("Courier New"),
        bool bParse = true);
    // Sets  the character format to be used for Comments

    void SetCommentCharFormat(CHARFORMAT& cfComments, bool bParse = true);
    // Sets the character format to be used for Normal Text

    void SetTextCharFormat(int nFontHeight = 8, 
        COLORREF clrFontColour = RGB(0, 0, 0), 
        CString strFontFace = _T("Courier New"),
        bool bParse = true);
    // Sets  the character format to be used for Normal Text

    void SetTextCharFormat(CHARFORMAT& cfText, bool bParse = true);

// Parsing functions

    
    // Parses all lines in the control,

    // colouring each line accordingly.

    void ParseAllLines();
    
// Miscellaneous functions

    // Loads the contents of the specified

    // file into the control. Replaces

    // the existing contents and parses all lines.

    void LoadFile(CString& strPath);
   // Enables/disables the background

   // coloring timer. If enabled, event

    // is raised every uiInterval millis

    // and nNumOfLines uncolored lines are colored.

    void SetBckgdColorTimer UINT uiInterval = 1000, 
                            int nNumOfLines = 10;

// Overrides

    // ClassWizard generated virtual function overrides

    //{{AFX_VIRTUAL(CHtmlRichEditCtrlSSL)

    protected:
    virtual void PreSubclassWindow();
    //}}AFX_VIRTUAL

Conclusion

During the development of CFastHtmlRichEditCtrl, several ideas came up of how to make the best and fastest HTML syntax highlighting control. If you have any other ideas, suggestions, and improvements, please let me know so I can update this article and implement them in the next version.

History

  • 22-3-2006:
    • Original article.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here