Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Clean RTF Merge Fields

0.00/5 (No votes)
10 Feb 2008 1  
Class to clean up / remove the RTF from custom merge fields in (RTF) documents.

Introduction

RTF to me has always been a pain; it is so tempting because it is all plain text and thus 'human' readable, but hold your horses...try to modify that font on that piece of string on line x...well, let's be honest... don't even try.

Well, RTF is still tempting for me, especially when it comes to users, documents, and yes: mail/document merging!

Alright, let's get to the point. This piece of code helps me when users use an RTF document, post it to a website, and expect their custom fields to be updated with the data from any source (addresses, personal names, or ... strange flower names).

This particular class expects an input as a string (builder), and a start- and an end char. It searches for a pair of these chars in the input string, removes all the RTF coding between them, and returns a cleaned up version of the full RTF string.

Also, an array of the changes is available.

Background

The problem mostly resides in how Word ver.X makes, modifies, and rebuilds RTF documents.

(This is no hail to WordPad or any other RTF editor, but they definitely do a better job at recreating the RTF into a simpler and straighter code than MS Word.)

If you would like a user to use merge fields like this in his document: [this_is_a_merge_field] and replace them with your own database field, there is a problem in Word when even a user accidentally changes the font and then removes it, in between the merge field string.

You would expect it to show in the RTF code as plain simple: [this_is_a_merge_field].

But instead, it becomes something like this: [this_is_a_}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329\charrsid3492762 merge}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329 _field].

Well, no way our simple String.Replace is going to find our 'this_is_a_merge_field' dbase field within that string. So... we made something that deals with this problem!

Using the code

The basic code puts in a string(builder). If you want to, give some start and end chars like [ ] or < >, or even # % ([ and ] are default). Don't put { } in it or a pipe |, as these are RTF codes or will be replaced.

Here is the main sub. Put a string(builder) in and expect a string(builder) out cleaned up:

Public Function CleanDocument(ByVal rtfSring As String, _
       Optional ByVal detectStartChar As Char = CChar("["),_
       Optional ByVal detectEndChar As Char = CChar("]")) As String

    Dim time As Integer = Date.Now.TimeOfDay.Milliseconds

    Dim sb As New StringBuilder(rtfSring)
    Dim sbclean As New StringBuilder(sb.ToString)
    Dim tempstr(1) As String
    Dim stepper As Integer = 0
    Do
        tempstr = ReturnNextRtfString(sb, detectStartChar, detectEndChar, True)
        
        If tempstr(0) Is Nothing Then Exit Do
        sbclean.Replace(tempstr(0), tempstr(1))
        
        ReDim Preserve _ArrayOfFields(1, stepper)
        _ArrayOfFields(0, stepper) = tempstr(0)
        _ArrayOfFields(1, stepper) = tempstr(1)
        stepper += 1
    Loop

    processedinmillisecconds = Date.Now.TimeOfDay.Milliseconds - time

    Return sbclean.ToString

End Function

Now, the helper subs find each substring with the start and end char:

Private Function ReturnNextRtfString(ByRef sb As StringBuilder, _
        ByVal startchar As Char, ByVal endchar As Char, _
        Optional ByVal autoclean As Boolean = False) As String()

    Dim startcounter, endcounter As Integer
    Dim acounter As Integer
    Dim returnstring(1) As String
    
    For acounter = bcounter To sb.Length - 1
         
        If sb.Chars(acounter) = startchar Then
            startcounter = acounter
        End If
        
        If sb.Chars(acounter) = endchar Then
            endcounter = acounter + 1
            'set nieuwe start voor de volgende aanroep van de functie
            bcounter = acounter + 1
        End If
        
        If startcounter > 0 AndAlso endcounter > startcounter Then
           
            If autoclean = True Then
                returnstring(1) = CleanRtfString(sb.ToString.Substring
                (startcounter, endcounter - startcounter))
                    returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            Else
                returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            End If

            Exit Function
        End If
    Next
    Return returnstring
End Function

And finally, the clean up function:

Private Function CleanRtfString(ByRef rtfstring As String) As String
    Dim sb As New StringBuilder(rtfstring)
    Dim cleansb As New StringBuilder

    Dim ccounter As Integer

    For ccounter = 0 To sb.Length
 
        If Asc(sb.Chars(ccounter)) > 32 AndAlso sb.Chars(ccounter) <> "|" _
           AndAlso sb.Chars(ccounter) <> "\" AndAlso sb.Chars(ccounter) <> "{" _
           AndAlso sb.Chars(ccounter) <> "}" Then
               cleansb.Append(sb.Chars(ccounter))
        End If

        If ccounter + 1 >= sb.Length Then Exit For
       
        If sb.Chars(ccounter + 1) = "\" OrElse sb.Chars(ccounter + 1) = "{" 
        OrElse sb.Chars(ccounter + 1) = "}" Then
            For dcounter As Integer = ccounter + 1 To sb.Length - 1
                If sb.Chars(dcounter) = CChar(" ") Then Exit For
                    sb.Chars(dcounter) = CChar("|")
            Next
        End If
    Next
   
    cleansb.Replace("|", "")
    Return cleansb.ToString
End Function

Problems to work on

  • When users insert an image in between the tags of the merge field, the cleaner will not be able to clean it up correctly.
  • When a Word hyperlink is in between the tags, this will also mangle the output.
  • Because the pipe char is used as a replacement, it can not be used in the field.
  • Spaces are the only identifiers in RTF, so as a simple solution, I remove them; this could be done more neatly.

History

This is my first article on The Code Project!

  • 10-02-2008: The first version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here