Introduction
This control is an extension of CRichEditCtrl
, based on Derek Lakin�s CHtmlRichEditCtrlSSL, written to provide fast syntax highlighting for HTML code. Apparently, CHtmlRichEditCtrlSSL
is too slow to be used with large files. Since my files are large, I decided to pick up the glove and develop a fast and scalable syntax highlighting control. The control can be used in two different ways:
- Loading and displaying a bunch of HTML code (from file, clipboard, sending
WM_SENDTEXT
), and
- Online parsing as you type.
Rules
The control follows a few simple rules to perform its syntax highlighting, as follows:
- Anything starting with '
<!--
' and ending with '-->
' is a comment.
- Anything between two double quotes ("") is quoted text.
- Anything from '
<
' to '>
' (except comments) is a tag.
- Anything else is normal text.
Strategy
In order to provide fast syntax highlighting, the control minimizes the coloring requests (calls to SetSel
/SetSelectionCharFormat
pair) by imposing a �clipping region� which is the visible lines range. The control also maintains a byte-per-line vector, holding a thin per-line state (colored/not colored, and the color state of last char on line) for quick coloring. When a line with already-colored state becomes visible � no need to do anything. When a not-colored line becomes visible � the line can be easily colored using the �color state� of the first char on line. A word about this �color state�: we go over the HTML line by line and calculate the color state (comment/quoted/tag/normal) of the last char (which is, of course, the state of the first char on the next line). This calculation is quite fast as we do it in one pass over the HTML without any coloring.
Updating colors as you type
It�s more complicated when �online� parsing and coloring is involved. Usually, as long as the user types �regular chars� (chars that do not change any color states), no special action is taken, as the CRichEditCtrl
maintains itself the current CHARFORMAT
.
There are two exceptions to that:
- The user places the caret at the beginning of the line. We need to explicitly set the correct color, or otherwise
CRichEditCtrl
would use the color of the next char. For example, when the user types <a>
, then moves the caret back to start and hits a key � this char would be colored as a tag instead of as normal text.
- The user types a key which completes/breaks a comment start/end combination. We need to invalidate up to three preceding chars. For example, when the user hits a key inside
<!--
, he might break a comment start combination.
When the user types �interesting chars� (chars that do change the color state such as �<� or ���), the colors of all chars starting from the current caret position are invalidated and the color states are recalculated (for optimization, if a line with unmodified color state is reached - the calculation is stopped as the next lines stay valid). �!� and �-� are also included in this category as they affect the comment start/end combinations thus invalidating up to three preceding chars.
Note that invalidation is always done towards the next line while the preceding lines are never affected!
These are all done in the WM_CHAR
handler.
When the user deletes text (VK_DELETE
, VK_BACK
, or pasting text over a selection), we want to inspect the text before it is deleted. This way, we can look for �interesting chars� inside the deleted text and invalidate the lines accordingly. We also take care of cases where deleting text completes/breaks the comment start/end combinations.
Text deletion and pasting is handled by WM_KEYDOWN
and by EN_CHANGE
handlers, as we want to access the text before it is deleted.
Note that for EN_CHANGE
to be used, we must set the event mask bit ENM_CHANGE
.
Scrolling
When the view area is changed (using the scroll bar, or by pressing Home, End, Page Up, Page Down, UP arrow, or DOWN arrow keys), the visible lines range changes. The control handles the EN_VSCROLL
and WM_VSCROLL
events to color the lines within the visible range that are not already colored (the EN_VSCROLL
notification handles everything except for the case of clicking the scroll bar mouse itself, which is handled by the WM_VSCROLL
notification). Note that for EN_VSCROLL
to be used, we must set the event mask bit ENM_SCROLL
.
Calculating the visible line range
The visible line range is the range between and including the first visible line and the last visible line. Easy, right? Well, not exactly. The CRichEditCtrl
is kind enough to tell you its first visible line using EM_GETFIRSTVISIBLELINE
, but won�t tell you its last visible line, as there�s no such thing as EM_GETLASTVISIBLELINE
. I found a workaround on the net using three other messages: EM_GETRECT
, EM_CHARFROMPOS
, and EM_EXLINEFROMCHAR
. The basic idea is to get the formatting rectangle of the CRichEditCtrl
, then retrieve information about the character closest to a specified point in the client area of an edit control and determine its line:
int CFastHtmlRichEditCtrl::GetLastVisibleLine()
{
RECT rfFormattingRect = {0};
GetRect(&rfFormattingRect);
rfFormattingRect.left++;
rfFormattingRect.bottom -= 2;
int nCharIndex =
CharFromPos(CPoint(rfFormattingRect.left,
rfFormattingRect.bottom));
return LineFromChar(nCharIndex);
}
The demo program
The demo program is an MFC dialog box which, based on a radio button's selection, displays either CFastHtmlRichEditCtrl
or CHtmlRichEditCtrlSSL
. The application is used for comparing performances between the two, by calling the ParseAllLines
API and displaying the in call duration. As a benchmark, I�m using a random XML file of size 164480, and 3838 lines, named ADO.XML.
Run the program, and click either �Refresh� to parse and color the control�s window text, or the ellipsis to load, parse and color a file (you can use my ADO.XML if you like).
The results I�m getting on my machine for ADO.XML are:
CFastHtmlRichEditCtrl
: 2 seconds
CHtmlRichEditCtrlSSL
: 169 seconds
Quite a difference, ha?
Scrolling problems
Coloring a range of lines might change the visible line range, as selection of words from first and last lines scrolls the lines to be fully visible. I�m overcoming this by calling LineScroll
after the coloring.
When the vertical scrollbar is visible, not in focus, and the user drags its thumb � the thumb jumps to the edge and back, thus flickers. Setting the focus to the CRichEditCtrl
just before handling the OnVScroll
event seems to solve this problem. However, the horizontal scrollbar might move right and left while scrolling, when the caret isn�t positioned at the beginning of line. This is probably because when the caret isn�t positioned at the beginning, its position differs from line to line. However, I can live with such a flaw.
Another problem relates to a feature I tried to implement: background coloring.
Background coloring
To be even faster, I added a feature named �background coloring�. The background coloring is implemented using a timer which runs periodically for a short duration, and colors a few lines until all lines have been colored. However, the horizontal scrolling flaw described above while dragging the vertical thumb also happens here without any user activity, which is very annoying! Caching the horizontal thumb position, coloring the lines and restoring it wouldn�t help. However, when the caret is positioned at the beginning of a line and no selection has been made � the horizontal scroll doesn�t flicker (note that this is true for the class �RichEdit20W
�. �RICHEDIT
� class still flickers).
To conclude this issue, my OnTimer
handler applies background coloring when:
- the control has the focus (prevents vertical scrollbar flickers) and the caret is positioned at the beginning of the line and no selection has been made, or
- the control window is completely obscured (in which there are no restrictions at all).
Note about Unicode
Rich Edit 2.0 has both ANSI and Unicode window classes�RichEdit20A
and RichEdit20W
, respectively. According to MSDN, you can specify the RICHEDIT_CLASS
constant which expands to the correct class depending on the definition of the UNICODE compile flag. However, in the demo, I coded the #ifdef _UNICODE
myself.
Class interface (based on CHtmlRichEditCtrlSSL)
public:
CHtmlRichEditCtrlSSL();
virtual ~CHtmlRichEditCtrlSSL();
public:
void SetTagCharFormat(int nFontHeight = 8,
COLORREF clrFontColour = RGB(128, 0, 0),
CString strFontFace = _T("Courier New"),
bool bParse = true);
void SetTagCharFormat(CHARFORMAT& cfTags, bool bParse = true);
void SetQuoteCharFormat(int nFontHeight = 8,
COLORREF clrFontColour = RGB(0, 128, 128),
CString strFontFace = _T("Courier New"),
bool bParse = true);
void SetQuoteCharFormat(CHARFORMAT& cfQuoted, bool bParse = true);
void SetCommentCharFormat(int nFontHeight = 8,
COLORREF clrFontColour = RGB(0, 128, 0),
CString strFontFace = _T("Courier New"),
bool bParse = true);
void SetCommentCharFormat(CHARFORMAT& cfComments, bool bParse = true);
void SetTextCharFormat(int nFontHeight = 8,
COLORREF clrFontColour = RGB(0, 0, 0),
CString strFontFace = _T("Courier New"),
bool bParse = true);
void SetTextCharFormat(CHARFORMAT& cfText, bool bParse = true);
void ParseAllLines();
void LoadFile(CString& strPath);
void SetBckgdColorTimer UINT uiInterval = 1000,
int nNumOfLines = 10;
protected:
virtual void PreSubclassWindow();
Conclusion
During the development of CFastHtmlRichEditCtrl
, several ideas came up of how to make the best and fastest HTML syntax highlighting control. If you have any other ideas, suggestions, and improvements, please let me know so I can update this article and implement them in the next version.
History