Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

An extended RichTextBox to save and load "HTML lite" files

0.00/5 (No votes)
5 Dec 2005 5  
This control provides a method to save and load HTML files directly, avoiding the use of RTF codes.

Introduction

When I was working on a chat application I found that the .NET RichTextBox control only allows you to save and load files using RTF codes or plain text files (oh! my God).

I also wanted a method for inserting images and ActiveX controls into the RichTextBox control, please see my article: Inserting images into a RichTextBox control (the OLE way).

Well, I decided to implement a successful solution to save and load "HTML lite" text into the RichTextBox control. It is named "HTML lite" because I don't handle all the HTML tags, only a small subset of them with some constraints. But, the control can be extended according to your needs to include other features and HTML tags handlers.

Background

I use Win32 APIs to get character and paragraph formatting structures. This should be more efficient than calling the native RichTextBox methods because I believe every call to a RichTextBox method makes a system SendMessage call, and I can use the PARAFORMAT and CHARFORMAT structures to get more information about the RichTextBox content with only one call at a time. There are many Internet sites and blogs that use this approach.

APIs

Please see the source code for more details.

[StructLayout( LayoutKind.Sequential )]
    public struct PARAFORMAT
{
    public int cbSize;
    public uint dwMask;
    ...
}

[ StructLayout( LayoutKind.Sequential )]
    public struct CHARFORMAT
{
    public int      cbSize; 
    public UInt32   dwMask; 
    public UInt32   dwEffects; 
    ...
}
//Constants

...

Adding HTML

To insert the HTML content into the control, I use the AddHTML method. In this function, I look for the starting HTML tag mark '<' and process it according to one of these:

b    = bold
i    = italic
u    = underline
s    = strikeout
sup  = superscript
sub  = subscript
p    = paragraph (attributes: align="alignment")
font = font (attributes: face="facename" 
                      color="#rrggbb" size="NN")
li   = list item

Here is the source code for the method. Please take a look at how I apply formatting using the APIs and how I ignore the unhandled tags. I'm also trying to adjust the font size value to the most approximate value because it should be a number between 1 and 7:

// looking for start tags

int nStart = strHTML.IndexOf('<');
if (nStart >= 0)
{
  if (nStart > 0)
  {
    // tag is not the first character, so

    // we need to add text to control and continue

    // looking for tags at the begining of the text

    strData = strHTML.Substring(0, nStart);
    strHTML = strHTML.Substring(nStart);
  }
  else
  {
    // ok, get tag value

    int nEnd = strHTML.IndexOf('>', nStart);
    if (nEnd > nStart)
    {
      if ((nEnd - nStart) > 0)
      {
        string strTag = strHTML.Substring(nStart, 
                                   nEnd - nStart + 1);
        strTag = strTag.ToLower();

        if (strTag == "<b>")
        {
          cf.dwMask |= CFM_WEIGHT | CFM_BOLD;
          cf.dwEffects |= CFE_BOLD;
          cf.wWeight = FW_BOLD;
        }
        else if (strTag == "<i>")
        {
          cf.dwMask |= CFM_ITALIC;
          cf.dwEffects |= CFE_ITALIC;
        }
        else if (strTag == "<u>")
        {
          cf.dwMask |= CFM_UNDERLINE | CFM_UNDERLINETYPE;
          cf.dwEffects |= CFE_UNDERLINE;
          cf.bUnderlineType = CFU_UNDERLINE;
        }
        else if (strTag == "<s>")
        {
          cf.dwMask |= CFM_STRIKEOUT;
          cf.dwEffects |= CFE_STRIKEOUT;
        }
        else if (strTag == "<sup>")
        {
          cf.dwMask |= CFM_SUPERSCRIPT;
          cf.dwEffects |= CFE_SUPERSCRIPT;
        }
        else if (strTag == "<sub>")
        {
          cf.dwMask |= CFM_SUBSCRIPT;
          cf.dwEffects |= CFE_SUBSCRIPT;
        }
        else if ((strTag.Length > 2) && 
                (strTag.Substring(0, 2) == "<p"))
        {
          if (strTag.IndexOf("align=\"left\"") > 0)
          {
            pf.dwMask |= PFM_ALIGNMENT;
            pf.wAlignment = (short)PFA_LEFT;
          }
          else if (strTag.IndexOf("align=\"right\"") > 0)
          {
            pf.dwMask |= PFM_ALIGNMENT;
            pf.wAlignment = (short)PFA_RIGHT;
          }
          else if (strTag.IndexOf("align=\"center\"") > 0)
          {
            pf.dwMask |= PFM_ALIGNMENT;
            pf.wAlignment = (short)PFA_CENTER;
          }
        }
        else if ((strTag.Length > 5) && 
                (strTag.Substring(0, 5) == "<font")
        {
          string strFont = new string(cf.szFaceName);
          strFont = strFont.Trim(chtrim);
          int crFont = cf.crTextColor;
          int yHeight = cf.yHeight;

          int nFace = strTag.IndexOf("face=");
          if (nFace > 0)
          {
            int nFaceEnd = strTag.IndexOf("\""", nFace + 6);
            if (nFaceEnd > nFace)
                strFont = 
                    strTag.Substring(nFace + 6, nFaceEnd - nFace - 6);
          }

          int nSize = strTag.IndexOf("size=");
          if (nSize > 0)
          {
            int nSizeEnd = strTag.IndexOf("\""", nSize + 6);
            if (nSizeEnd > nSize)
            {
              yHeight = int.Parse(strTag.Substring(nSize + 6, 
                                             nSizeEnd - nSize - 6));
              yHeight *= (20 * 5);
            }
          }

          int nColor = strTag.IndexOf("color=");
          if (nColor > 0)
          {
            int nColorEnd = strTag.IndexOf("\""", nColor + 7);
            if (nColorEnd > nColor)
            {
              if (strTag.Substring(nColor + 7, 1) == "#")
              {
                string strCr = strTag.Substring(nColor + 8, 
                                      nColorEnd - nColor - 8);
                int nCr = Convert.ToInt32(strCr, 16);

                Color color = Color.FromArgb(nCr);

                crFont = GetCOLORREF(color);
              }
              else
              {
                crFont = int.Parse(strTag.Substring(nColor + 7, 
                                         nColorEnd - nColor - 7));
              }
            }
          }

          cf.szFaceName = new char[LF_FACESIZE];
          strFont.CopyTo(0, cf.szFaceName, 0, 
                     Math.Min(LF_FACESIZE - 1, strFont.Length));
          //cf.szFaceName = strFont.ToCharArray(0, 
                         Math.Min(strFont.Length, LF_FACESIZE));
          cf.crTextColor = crFont;
          cf.yHeight = yHeight;

          cf.dwMask |= CFM_COLOR | CFM_SIZE | CFM_FACE;
          cf.dwEffects &= ~CFE_AUTOCOLOR;
        }
        else if (strTag == "<li>")
        {
          if (pf.wNumbering != PFN_BULLET)
          {
            pf.dwMask |= PFM_NUMBERING;
            pf.wNumbering = (short)PFN_BULLET;
          }
        }
        else if (strTag == "</b>")
        {
          cf.dwEffects &= ~CFE_BOLD;
          cf.wWeight = FW_NORMAL;
        }
        else if (strTag == "</i>")
        {
          cf.dwEffects &= ~CFE_ITALIC;
        }
        else if (strTag == "</u>")
        {
          cf.dwEffects &= ~CFE_UNDERLINE;
        }
        else if (strTag == "</s>")
        {
          cf.dwEffects &= ~CFM_STRIKEOUT;
        }
        else if (strTag == "</sup>")
        {
          cf.dwEffects &= ~CFE_SUPERSCRIPT;
        }
        else if (strTag == "</sub>")
        {
          cf.dwEffects &= ~CFE_SUBSCRIPT;
        }
        else if (strTag == "</font>")
        {
        }
        else if (strTag == "</p>")
        {
        }
        else if (strTag == "")
        {
        }

        //-------------------------------
        // now, remove tag from HTML
        int nStart2 = strHTML.IndexOf("<", nEnd + 1);
        if (nStart2 > 0)
        {
          // extract partial data
          strData = strHTML.Substring(nEnd + 1, nStart2 - nEnd - 1);
          strHTML = strHTML.Substring(nStart2);
        }
        else
        {
          // get remain text and finish
          if ((nEnd + 1) < strHTML.Length)
            strData = strHTML.Substring(nEnd + 1);
          else
            strData = "";

          strHTML = "";
        }
        //-------------------------------s


        //-------------------------------
        // have we any continuos tag ?
        if (strData.Length > 0)
        {
          // yes, ok, goto to reinit
          if (strData[0] == '<')
            goto reinit;
        }
        //-------------------------------
      }
      else
      {
        // we have not found any valid tag
        strHTML = "";
      }
    }
    else
    {
      // we have not found any valid tag
      strHTML = "";
    }
  }
}
else
{
  // we have not found any tag
  strHTML = "";
}

To apply formatting through PARAFORMAT and CHARFORMAT, I use properties (a good tip taken from the Internet). Please, see the source code for more details:

public PARAFORMAT ParaFormat
{
  get
  {
    PARAFORMAT pf = new PARAFORMAT();
    pf.cbSize = Marshal.SizeOf( pf );
      
    // Get the alignment.

    SendMessage( new HandleRef( this, Handle ),
      EM_GETPARAFORMAT,
      SCF_SELECTION, ref pf );
      
    return pf;
  }
  
  set
  {
    PARAFORMAT pf = value;
    pf.cbSize = Marshal.SizeOf( pf );
      
    // Set the alignment.

    SendMessage( new HandleRef( this, Handle ),
      EM_SETPARAFORMAT,
      SCF_SELECTION, ref pf );
  }
}

public PARAFORMAT DefaultParaFormat
{  
   ...
}

public CHARFORMAT CharFormat
{
   ...
}

public CHARFORMAT DefaultCharFormat
{
   ...
}

And here is how I write the text formatting information to the control using its new properties. The variable strData contains the plain text before applying the format:

if (strData.Length > 0)
{
  //-------------------------------

  // replace entities

  strData = strData.Replace("&amp;", "&");
  strData = strData.Replace("&lt;", "<");
  strData = strData.Replace("&gt;", ">");
  strData = strData.Replace("&apos;", "'");
  strData = strData.Replace("&quot;", "\""");
  //-------------------------------

  string strAux = strData; // use another copy

  while (strAux.Length > 0)
  {
    //-----------------------
    int nLen = strAux.Length;
    //-----------------------

    //-------------------------------
    // now, add text to control
    int nStartCache = this.SelectionStart;
    string strt = strAux.Substring(0, nLen);

    this.SelectedText = strt;
    strAux = strAux.Remove(0, nLen);

    this.SelectionStart = nStartCache;
    this.SelectionLength = strt.Length;
    //-------------------------------

    //-------------------------------
    // apply format
    this.ParaFormat = pf;
    this.CharFormat = cf;
    //-------------------------------


    // reposition to final
    this.SelectionStart = this.TextLength+1;
    this.SelectionLength = 0;
  }

  // reposition to final
  this.SelectionStart = this.TextLength+1;
  this.SelectionLength = 0;

  //-------------------------------
  // new paragraph requires to reset alignment
  if ((strData.IndexOf("\r\n", 0) >= 0) || 
                   (strData.IndexOf("\n", 0) >= 0))
  {
    pf.dwMask = PFM_ALIGNMENT|PFM_NUMBERING;
    pf.wAlignment = (short)PFA_LEFT;
    pf.wNumbering = 0;
  }
  //-------------------------------

Getting HTML content from the control

To get the HTML content from the control I use the following approach: character by character (If someone knows an alternative method, please let me know).

I perform formatting analysis one by one on the characters in the control and extract the information about its style, if at any moment the character format or the paragraph format is changed, I add an HTML tag to the raw text.

This is done by using an internal structure cMyREFormat that stores the related information such as the position and the tag that should be there in that place:

private enum uMyREType
{
  U_MYRE_TYPE_TAG,
  U_MYRE_TYPE_EMO,
  U_MYRE_TYPE_ENTITY,
}

private struct cMyREFormat
{
  public uMyREType nType;
  public int nLen;
  public int nPos;
  public string strValue;
}

Step 1

Find the entities ( &, <, >, ", ' ) and store their positions:

char[] ch = {'&', '<', '>', '""', '\''};
string[] strreplace = {"&amp;", "&lt;", "&gt;", 
                              "&quot;", "&apos;"};

for (i = 0; i < ch.Length; i++)
{
    char[] ch2 = {ch[i]};

    int n = this.Find(ch2, 0);
    while (n != -1)
    {
        mfr = new cMyREFormat();

        mfr.nPos = n;
        mfr.nLen = 1;
        mfr.nType = uMyREType.U_MYRE_TYPE_ENTITY;
        mfr.strValue = strreplace[i];

        colFormat.Add(mfr);

        n = this.Find(ch2, n+1);
    }
}

Step 2

Look for font changes:

//-------------------------

// get format for this character

cf = this.CharFormat;
pf = this.ParaFormat;

string strfname = new string(cf.szFaceName);
strfname = strfname.Trim(chtrim);
//-------------------------



//-------------------------

// new font format ?

if ((strFont != strfname) || (crFont != cf.crTextColor) || 
                                    (yHeight != cf.yHeight))
{
    if (strFont != "")
    {
        // close previous <font> tag


        mfr = new cMyREFormat();

        mfr.nPos = i;
        mfr.nLen = 0;
        mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
        mfr.strValue = "</font>";

        colFormat.Add(mfr);
    }

    //-------------------------

    // save this for cache

    strFont = strfname;
    crFont = cf.crTextColor;
    yHeight = cf.yHeight;
    //-------------------------


    //-------------------------

    // font size should be translate to 

    // html size (Approximately)

    int fsize = yHeight / (20 * 5);
    //-------------------------


    //-------------------------

    // color object from COLORREF

    color = GetColor(crFont);
    //-------------------------


    //-------------------------

    // add <font> tag

    mfr = new cMyREFormat();

    string strcolor = string.Concat("#", 
                 (color.ToArgb() & 0x00FFFFFF).ToString("X6"));

    mfr.nPos = i;
    mfr.nLen = 0;
    mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
    mfr.strValue = "<font face=\"" + strFont + "\" color=\"" + 
                        strcolor + "\" size=\"" + fsize + "\">";;

    colFormat.Add(mfr);
    //-------------------------

Step 3

Look for paragraph format changes and close the previous tags if we are in a new paragraph. This is done by using states:

  • none: no format applied,
  • new: apply new format style (<b>,<i>,<p>... etc.),
  • continue: format is same as that of the previous (no changes),
  • reset: close and start again (</b>,</i>,</p>... etc.).
//-------------------------

// are we in another line ?

if ((strChar == "\r") || (strChar == "\n"))
{
    // yes?

    // then, we need to reset paragraph format

    // and character format

    if (bParaFormat)
    {
        bnumbering = ctformatStates.nctNone;
        baleft = ctformatStates.nctNone;
        baright = ctformatStates.nctNone;
        bacenter = ctformatStates.nctNone;
    }

    // close previous tags


    // is italic? => close it

    if (bitalic != ctformatStates.nctNone)
    {
        mfr = new cMyREFormat();

        mfr.nPos = i;
        mfr.nLen = 0;
        mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
        mfr.strValue = "</i>";

        colFormat.Add(mfr);

        bitalic = ctformatStates.nctNone;
    }

    // is bold? => close it

    if (bold != ctformatStates.nctNone)
    {
        ...
    }

    ...
}

// now, process the paragraph format,

// managing states: none, new, 

// continue {with previous}, reset

if (bParaFormat)
{
    // align to center?

    if (pf.wAlignment == PFA_CENTER)
    {
        if (bacenter == ctformatStates.nctNone)
            bacenter = ctformatStates.nctNew;
        else
            bacenter = ctformatStates.nctContinue;
    }
    else
    {
        if (bacenter != ctformatStates.nctNone)
            bacenter = ctformatStates.nctReset;
    }

    if (bacenter == ctformatStates.nctNew)
    {
        mfr = new cMyREFormat();

        mfr.nPos = i;
        mfr.nLen = 0;
        mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
        mfr.strValue = "<p align='\"center\"'>";

        colFormat.Add(mfr);
    }
    else if (bacenter == ctformatStates.nctReset)
        bacenter = ctformatStates.nctNone;
    //---------------------


    //---------------------

    // align to left ?

    if (pf.wAlignment == PFA_LEFT)
    {
        ...
    }
    //---------------------


    //---------------------

    // align to right ?

    if (pf.wAlignment == PFA_RIGHT)
    {
        ...
    }
    //---------------------


    //---------------------

    // bullet ?

    if (pf.wNumbering == PFN_BULLET)
    {
        ...
    }
    //---------------------

}

Step 4

Look for changes in style: bold, italic, underline, strikeout (with the same method, using states):

//---------------------

// bold ?

if ((cf.dwEffects & CFE_BOLD) == CFE_BOLD)
{
    if (bold == ctformatStates.nctNone)
        bold = ctformatStates.nctNew;
    else
        bold = ctformatStates.nctContinue;
}
else
{
    if (bold != ctformatStates.nctNone)
        bold = ctformatStates.nctReset;
}

if (bold == ctformatStates.nctNew)
{
    mfr = new cMyREFormat();

    mfr.nPos = i;
    mfr.nLen = 0;
    mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
    mfr.strValue = "<b>";

    colFormat.Add(mfr);
}
else if (bold == ctformatStates.nctReset)
{
    mfr = new cMyREFormat();

    mfr.nPos = i;
    mfr.nLen = 0;
    mfr.nType = uMyREType.U_MYRE_TYPE_TAG;
    mfr.strValue = "</b>";

    colFormat.Add(mfr);

    bold = ctformatStates.nctNone;
}
//---------------------


//---------------------

// Italic

if ((cf.dwEffects & CFE_ITALIC) == CFE_ITALIC)
{
    ...
}
//---------------------


...

Step 5

Sort the formatting array and apply styles by adding the characters and tags one by one until the HTML text is completed:

//--------------------------

// now, reorder the formatting array

k = colFormat.Count;
for (i = 0; i < k - 1; i++)
{
    for (int j = i + 1; j < k; j++)
    {
        mfr = (cMyREFormat)colFormat[i];
        cMyREFormat mfr2 = (cMyREFormat)colFormat[j];

        if (mfr2.nPos < mfr.nPos)
        {
            colFormat.RemoveAt(j);
            colFormat.Insert(i, mfr2);
            j--;
        }
        else if ((mfr2.nPos == mfr.nPos) && 
                              (mfr2.nLen < mfr.nLen))
        {
            colFormat.RemoveAt(j);
            colFormat.Insert(i, mfr2);
            j--;
        }
    }
}
//--------------------------



//--------------------------

// apply format by replacing and inserting HTML tags

// stored in the Format Array

int nAcum = 0;
for (i = 0; i < k; i++)
{
    mfr = (cMyREFormat)colFormat[i];

    strHTML += 
       strT.Substring(nAcum, mfr.nPos - nAcum) + mfr.strValue;
    nAcum = mfr.nPos + mfr.nLen;
}

if (nAcum < strT.Length)
    strHTML += strT.Substring(nAcum);
//--------------------------

Points of interest

To avoid constant screen updates when character and paragraph formats are applied, I use the Faster Updating approach given in the article Extending RichTextBox by Pete Vidler.

This is done by sending two messages to the control: EM_SETEVENTMASK to prevent the control from raising any events and WM_SETREDRAW to prevent the control from redrawing itself:

public void BeginUpdate()
{
    // Deal with nested calls.

    ++updating;
    
    if ( updating > 1 )
        return;
    
    // Prevent the control from raising any events.

    oldEventMask = SendMessage( new HandleRef( this, Handle ),
        EM_SETEVENTMASK, 0, 0 );
    
    // Prevent the control from redrawing itself.

    SendMessage( new HandleRef( this, Handle ),
        WM_SETREDRAW, 0, 0 );
}

/// <SUMMARY>

/// Resumes drawing and event handling.

/// </SUMMARY>

/// <REMARKS>

/// This method should be called every time a call is made

/// made to BeginUpdate. It resets the event mask to it's

/// original value and enables redrawing of the control.

/// </REMARKS>

public void EndUpdate()
{
    // Deal with nested calls.

    --updating;
    
    if ( updating > 0 )
        return;
    
    // Allow the control to redraw itself.

    SendMessage( new HandleRef( this, Handle ),
        WM_SETREDRAW, 1, 0 );

    // Allow the control to raise event messages.

    SendMessage( new HandleRef( this, Handle ),
        EM_SETEVENTMASK, 0, oldEventMask );
}

/// <SUMMARY>

/// Returns true when the control is performing some 

/// internal updates, specially when is reading or writing

/// HTML text

/// </SUMMARY>

public bool InternalUpdating
{
    get 
    {
        return (updating != 0);
    }
}

Using the code

To use the code, simply add reference to the HmlRichTextBox and call the methods AddHTML and GetHTML.

I use a toolbar with formatting buttons. To update the button states, I handle the event OnSelectionChanged. Keep in mind that you must use the property InternalUpdating to improve the performance when you are converting from/to HTML text:

private void richTextBox1_SelectionChanged(object sender, 
                                         System.EventArgs e)
{
    if (!richTextBox1.InternalUpdating)
        UpdateToolbar(); //Update the toolbar buttons

}

/// <SUMMARY>

///     Update the toolbar button statuses

/// </SUMMARY>

public void UpdateToolbar()
{
    //This is done incase 2 different 

    //fonts are selected at the same time

    //If that is the case there is no 

    //selection font so I use the default

    //font instead.

    Font fnt;
    
    if (richTextBox1.SelectionFont != null)
        fnt = richTextBox1.SelectionFont;
    else
        fnt = richTextBox1.Font;

    //Do all the toolbar button checks

    tbbBold.Pushed      = fnt.Bold; //bold button

    tbbItalic.Pushed    = fnt.Italic; //italic button

    tbbUnderline.Pushed = fnt.Underline; //underline button

    tbbStrikeout.Pushed = fnt.Strikeout; //strikeout button

    tbbLeft.Pushed      = (richTextBox1.SelectionAlignment == 
                            HorizontalAlignment.Left); //justify left

    tbbCenter.Pushed    = (richTextBox1.SelectionAlignment == 
                            HorizontalAlignment.Center); //justify center

    tbbRight.Pushed     = (richTextBox1.SelectionAlignment == 
                            HorizontalAlignment.Right); //justify right

}

References and credits

History

  • 5th Nov, 2005: Version 1.0
  • 5th Dec, 2005: Version 1.1
    • Superscript and subscript styles added.

Note

Please make your comments, corrections or requirements for credits. Your feedback is most welcome.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here