Introduction
This article presents a WinForms text edit control that supports a flexible highlighting and word coloring decoration system. My goal is to present this control and the optimizations that were necessary for it to run at a reasonable speed simply enough that a C# novice would be able to understand it.
Background
This is the third CodeBox that I have put up. The first two were built for use with WPF. I would like to think that CodeBox and its radically redesigned descendant CodeBox 2 would be useful even though this WinForms implementation is very different. They all share the same basic decoration concept.
Using the Code
The code should not be hard to use as the control is inherited from the RichTextBox
. Without adding Decoration
s, it is almost indistinguishable from the RichTextBox
. Decorations fall into two major categories corresponding to the DecorationScheme
and Decoration
classes. A DecorationScheme
is basically just a conveniently grouped collection of Decoration
items. For example:
codeBox.DecorationScheme = WinFormsCodeBox.Decorations.DecorationSchemes.CSharp3;
will give a code coloration similar to what you would see in Visual Studio for C#, while:
codeBox.DecorationScheme = WinFormsCodeBox.Decorations.DecorationSchemes.Xml;
will give you coloration similar to the appearance of XML in Visual Studio. The DecorationScheme
is intended to set the basic look of the text. After that, we can set additional decorations.
LineDecoration ld = new LineDecoration()
{
DecorationType = EDecorationType.Hilight
,Color = Color.Yellow
,Line =2
};
codeBox.Decorations.Add(ld);
The above code will highlight line two of the CodeBox in yellow. Please note that adding a decoration will not automatically update the display. Updates occur when either the Text
changes, or:
codeBox.ApplyDecorations();
is called. At present, there are a number of premade decorations:
StringDecoration
: Decoration based on index positions of a single string
MultiStringDecoration
: Decoration based on index positions of a list of strings
RegexDecoration
: Decoration based on a single Regular Expression string
MultiStringDecoration
: Decoration based on a list of Regular Expression strings
ExplicitDecoration
: Decoration explicitly specified as a starting position and a length - simple but useful when working with selection
MultiExplicitDecoration
: Decoration explicitly specified as a list of starting positions and lengths
MultiRegexWordDecoration
: Decoration based on a list of strings sandwiched between word boundaries
DoubleQuotedDecoration
: Decoration of text between double quotes
LineDecoration
: Decoration of a specified line of text
MultiLineDecoration
: Decoration of a list of specified lines of text
DoubleRegexDecoration
: Decoration based on a pair of Regular Expression strings where the second expression is matched against the results of the first
RegexMatchDecoration
: Decoration based on both the match and the group for a Regular Expression
Let's look at a few examples:
ExplicitDecoration ed = new ExplicitDecoration()
{
Start = this.CodeBox.SelectionStart,
Length = this.CodeBox.SelectionLength,
DecorationType = EDecorationType.TextColor ,
Color = Color.Green
};
this.CodeBox.Decorations.Add(ed);
Assuming that we had a WinFormsCodeBox
named CodeBox
, this would make the text color of the selection Green
.
RegexDecoration singleLineComment = new RegexDecoration()
{
DecorationType = EDecorationType.TextColor,
Color = Color.Green,
RegexString = "//.*"
};
This decoration will color single line comments Green
(C# style).
private static List<string> CSharpVariableReservations()
{
return new List<string>() { "string", "int", "double",
"long", "void" , "true",
"false", "null"};
}
MultiRegexWordDecoration BlueClasses = new MultiRegexWordDecoration()
{
Color = Color.Blue,
Words = CSharpVariableReservations(),
IsCaseSensitive = true
};
These together will make the words defined in CSharpVariableReservations blue
. Note that string would be blue
, but happystring would not be so colored.
RegexMatchDecoration xmlAttributeValue = new RegexMatchDecoration()
{
Color = Color.Blue,
RegexString = @"\s(\w+|\w+:\w+|(\w|\.)+)\s*=\s*""(?<selected />.*?)"""
};
This will make the attribute portion of XML tags Red
.
There are premade decoration schemes for C#, SQL Server, XAML, DBML, and XML. Admittedly, they could probably use a bit of refinement, but they all work pretty well. This is pretty much all one needs to know in order to put WinFormsCodeBox
to use.
How It Works
The Basic Idea
WinFormsCodeBox
inherits from RichTextBox
. The decorations that we want to make are created and are applied by moving the selection around and setting the SelectionColor
and SelectionBackColor
properties.
Decorations are defined in terms of the TextIndex
class. (Please note that in the previous CodeBox articles, TextIndex
was called a Pair
.)
namespace TextUtils
{
public class TextIndex : IComparable<textindex>
{
public int Start { get; set; }
public int Length { get; set; }
... Other stuff
}
}
These are grouped together in TextIndexList
s:
public class TextIndexList : List<textindex>
{ ... lots of methods}
These TextIndexList
s are created by the various decorations. The decoration classes all are descended from the abstract
class Decoration
.
public abstract class Decoration
{
public EDecorationType DecorationType { get; set; }
public Color Color{ get; set; }
public abstract TextIndexList Ranges(string text);
... other stuff
}
These decorations are then applied to the WinFormsCodeBox
through the ApplyDecoration
method.
private void ApplyDecoration(Decoration d, TextIndexList tl)
{
switch (d.DecorationType)
{
case EDecorationType.TextColor:
foreach (TextIndex t in tl)
{
this.Select(t.Start, t.Length);
this.SelectionColor = d.Color;
}
break;
case EDecorationType.Hilight :
foreach (TextIndex t in tl)
{
this.Select(t.Start, t.Length);
this.SelectionBackColor = d.Color;
}
break;
}
}
Problems and First Optimization
There are just two little problems with the code as implemented so far. It is too slow to be used, and it suffers from the scrollbar jittering up and down as one types. When this happens, we need to decide what to do. Surrender is a reasonable option, but before one gives up, one should see if there is at least a little hope. One of the problems is that the OnTextChanged
event of the RichTextBox
is fired not just when the characters are changed, but for each formatting change. It is easy to fix this.
protected override void OnTextChanged(EventArgs e)
{
base.OnTextChanged(e);
if (!mDecorationInProgress)
{
ApplyDecorations();
}
}
The mDecorationInProgress
is set to true
at the beginning of the ApplyDecorations
method and then back to false
at the end. This has a significant impact on the speed, but not nearly enough to make the control useable. The problem with the screen is that whenever the selection is changed, the textbox
scrolls to make the selection visible. The jumps occurs when it is necessary to scroll down to make something visible and then scroll back up to where we started. It could be off by a few lines. If the RichTextBox
has a vertical scroll position property, this would be easy to deal with, but it does not. Fortunately, I had been plagued by this in the past, so I looked up what I did back then. COM Interop saves the day.
[DllImport("user32.dll")]
private static extern int SendMessage(IntPtr hwndLock, Int32 wMsg,
Int32 wParam, ref Point pt
private Point ScrollPosition
{
get
{
const int EM_GETSCROLLPOS = 0x0400 + 221;
Point pt = new Point();
SendMessage(this.Handle, EM_GETSCROLLPOS, 0, ref pt);
return pt;
}
set
{
const int EM_SETSCROLLPOS = 0x0400 + 222;
SendMessage(this.Handle, EM_SETSCROLLPOS, 0, ref value);
}
}
This just happens to be the correct function. I'm sure that I have some unnamed guru on the internet to thank for that one. With a property like this, we just call:
Point origScroll = ScrollPosition;
before working with the selection, and:
ScrollPosition = origScroll;
after we are finished. This completely takes care of the jumping problem. By a stroke of good fortune, there was another piece of useful code in my old project, the import for locking out screen updates:
[DllImport("user32", CharSet = CharSet.Ansi, SetLastError = true, ExactSpelling = true)]
private static extern int LockWindowUpdate(int hWnd);
Before working with the selections, we call:
LockWindowUpdate(this.Handle.ToInt32());
and afterwards, we call:
LockWindowUpdate(0);
Saving old code is good. By this point, the control can be used for relatively small amounts of text, like the definition of a typical Stored Procedure. The question is whether the control can be optimized further. Some timing data shows that the applying of decorations takes between 100 and 1000 times as long as determining what decorations need to be applied. Given that, I formulated three possible strategies for further optimization:
- Poke around in the class with Reflector and look for helpful internal methods to set the properties
- Go low level and start working directly with RTF
- Try to apply the decorations more efficiently
My first instinct was to use Reflector. In WPF, this usually works wonders. Here's what the set
portion of the SelectionColor
property looks like:
set
{
this.ForceHandleCreate();
NativeMethods.CHARFORMATA charFormat = this.GetCharFormat(true);
charFormat.dwMask = 0x40000000;
charFormat.dwEffects = 0;
charFormat.crTextColor = ColorTranslator.ToWin32(value);
UnsafeNativeMethods.SendMessage(new HandleRef(this, base.Handle),
0x444, 1, charFormat);
}
That was enough to convince me that Reflector was not going to give me something easily. In the past, I have worked with RTF, and it is not something that I would do as anything other than a last resort. That left me with the last possibility. The RTF that sits in the RichTextBox
is a persistent medium. We do not have to update everything on each update. We could just update the areas that have changed and need to be updated. Doing this requires a closer look at TextIndex
and TextIndexList
.
Lists of TextIndexes and Second Optimization
In order to only modify the changed parts of the text, we need to be able to differentiate TextindexList
s. There are three different ways that we can look at a TextindexList
:
- A
TextIndexList
is a List
.
- A
TextIndexList
can be thought of as set of line segments on a line.
- A
TextIndexList
can be thought of as a BitArray
.
For example, consider the following TextIndexList
:
TextIndexList tl = new TextIndexList();
tl.Add(new TextIndex() { Start = 1, Length = 2 });
tl.Add(new TextIndex() { Start = 4, Length = 2 });
which can be created more concisely by:
TextIndexList tl = TextIndexList.Parse("1,2:4,2");
This contains the same information as the following line segment:
which is the same as the bit array of [false,true,true,false,true,true]. You might be a bit skeptical and wonder if I just happened to pick a good example. What would it mean if the TextIndex
es overlapped. That is the important point. The decorations are designed so that a double application of one is the same as a single application. If we have a yellow background and it overlaps with another yellow background, it is the same as a single bigger yellow background. Order also has no meaning, so the following two TextIndexList
objects are effectively equivalent:
TextIndexList tl1 = TextIndexList.Parse("1,2:4,2");
TextIndexList tl2 = TextIndexList.Parse("4,2:1,2");
Geometric Interpretation
The easiest way to understand how to determine the minimum set of TextIndex
es that need to be changed is by looking at the situation geometrically in terms of line segments. So, let's consider the situation where there is only one decoration. There are two TextIndexList
s representing where in the text the decorations would be applied.
The two things that one should notice are, the range where the TextIndexList
s are different is clearly defined, and their differences can be thought of as a TextIndexList
. The bounds of this new TextIndexList
can be thought of as a TextIndex
. When updating the display, we only need to concern ourselves with updating the text formatting in the "Where Different Bounds" area. Usually, we have more than one decoration. The decoration scheme for SQL Server contains 11.
When we have more decorations, they can be visualized as stacked on top of each other. The set of the difference bounds for the individual decorations form a TextIndexList
. From that, I can get an overall difference range which is the area that would need to be updated. Finally, we get the actual decorations to be applied by projecting the original decorations onto this combined range.
BitArray Interpretation
To calculate the TextIndexList
s, we can turn to the BitArray
interpretation of the TextIndexList
. The routine to produce a BitArray
from a TextIndexList
is straightforward.
public BitArray ToBitArray(int size)
{
BitArray bits = new BitArray(size);
foreach (TextIndex t in this)
{
int maxVal = Math.Min(size, t.Start + t.Length);
for (int i = t.Start; i < maxVal; i++)
{
bits[i] = true;
}
}
return bits;
}
The size
parameter is just there for a bit of extra flexibility. As long as it is greater than or equal to the upper bounds of the TextIndexList
, the conversion will be complete. The conversion back is a little harder to follow.
public static TextIndexList FromBitArray(BitArray bits)
{
return FromBitArray(bits, new TextIndex (0, bits.Length));
}
public static TextIndexList FromBitArray(BitArray bits, TextIndex index)
{
string bitString = BitArrayString(bits);
TextIndexList tl = new TextIndexList();
int currentStart = -1;
int lastBit = Math.Min(index.Start + index.Length, bits.Length);
for (int i = index.Start; i < lastBit; i++)
{
if (bits[i])
{
if (currentStart == -1)
{
currentStart = i;
}
}
else
{
if (currentStart != -1)
{
tl.Add(TextIndex.FromStartEnd(currentStart, i ));
currentStart = -1;
}
}
}
if (currentStart != -1)
{
tl.Add(TextIndex.FromStartEnd(currentStart, index.End ));
}
return tl;
}
Code like this is why we have unit tests. The interesting thing to note is this code:
TextIndexList tl = TextIndexList.FromBitArray(tl.ToBitArray());
It both merges the overlapping TextIndex
es and sorts the TextIndexList tl
.
In order to find the differences in the two TextIndexList
s, we can take the Symmetric Difference of the BitArray
s.
public TextIndexList SymetricDifference(TextIndexList tl)
{
int arraySize = Math.Max(this.Bounds.End, tl.Bounds.End);
BitArray bArray = this.ToBitArray(arraySize);
BitArray btlArray = tl.ToBitArray(arraySize);
BitArray bResult = bArray.Xor(btlArray);
return TextIndexList.FromBitArray(bResult);
}
If this is not obvious, look at the line segment picture for a bit and it should be. I'm sure that this could be done without BitArray
s, but the XOR
makes it come out much cleaner. Furthermore, we can create the projections by using the FromBitArray
method. Altering the starting and ending points of the loop can be used to restrict the TextIndexList
to a specified TextIndex
.
Third Optimization - Shifting
The second optimization seemed like it should significantly improve things, but the improvement turned out to be rather modest. The problem turned out to be the nature of changes to the text. The most common way a textbox's text is changed is through typing. Each time a key is pressed under normal circumstances (no delete, backspace, or previous selection), the position of every character in the document after the insertion point is increased by one. This means that the area that the updates are restricted to runs from around the current character until the end of the last decoration in the document. It helps a lot at the very end, but not at all at the beginning. Fortunately, this is easy to fix. The TextDelta
class takes two strings in its constructor and finds the first difference and the offset connected with the text change. Please note that I am taking advantage of the fact that it is not possible to make a noncontiguous single edit with keyboard and mouse. If we apply the very simple shift function to the previous TextIndex
es:
public void Shift(int startingIndex, int amount)
{
foreach (TextIndex t in this)
{
if (t.Contains(startingIndex ) )
{
t.Length +=amount;
}
else if (t.Start > startingIndex)
{
t.Start += amount;
}
}
}
the hoped for performance boost arrives and we have a control that is optimized enough for practical use. (Please note that this is the updated version of the "very simple shift function." The original had an error in it.)
Conclusion
My intention in this article was both to present a useful control and to make it easy to understand. I'm pretty sure that the former was successful, but I have some doubt about the latter. Optimized code is often mysterious. We find routines that exist only to increase speed in various special situations. Without knowing the history, we often end up wondering if this strange construction existed because the previous programmer didn't know what he was doing or was in love with cut and paste. I did not want to present this code as if it was the obvious way of handling the situation. Perhaps it is, but it certainly was not obvious to me. Hopefully, this short saga of transformation of a textbox
that started requiring about two minutes (for a large file) per keystroke to around .03 seconds (4,000 times better) can be of some service. I would also like to give special thanks to Arthur Jordison whose encouragement made this project prioritized enough to get done.
Updates
11/1/2000 - Bug Fix
The Shift
Method in the TextIndexList
was not taking the possibility that the start of the shift might be within the TextIndex
. It does now.
11/11/2000 - Bug Fix
Problems with pasting, undo and changes made within variable length decorations have been fixed. This showed a serious deficiency in undo functionality which is in the process of being corrected. I will make a detailed update of my explanation upon its completion.