Introduction
RTF to me has always been a pain; it is so tempting because it is all plain text and thus 'human' readable, but hold your horses...try to modify that font on that piece of string on line x...well, let's be honest... don't even try.
Well, RTF is still tempting for me, especially when it comes to users, documents, and yes: mail/document merging!
Alright, let's get to the point. This piece of code helps me when users use an RTF document, post it to a website, and expect their custom fields to be updated with the data from any source (addresses, personal names, or ... strange flower names).
This particular class expects an input as a string (builder), and a start- and an end char. It searches for a pair of these chars in the input string, removes all the RTF coding between them, and returns a cleaned up version of the full RTF string.
Also, an array of the changes is available.
Background
The problem mostly resides in how Word ver.X makes, modifies, and rebuilds RTF documents.
(This is no hail to WordPad or any other RTF editor, but they definitely do a better job at recreating the RTF into a simpler and straighter code than MS Word.)
If you would like a user to use merge fields like this in his document: [this_is_a_merge_field] and replace them with your own database field, there is a problem in Word when even a user accidentally changes the font and then removes it, in between the merge field string.
You would expect it to show in the RTF code as plain simple: [this_is_a_merge_field].
But instead, it becomes something like this: [this_is_a_}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329\charrsid3492762 merge}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329 _field].
Well, no way our simple String.Replace
is going to find our 'this_is_a_merge_field' dbase field within that string. So... we made something that deals with this problem!
Using the code
The basic code puts in a string(builder). If you want to, give some start and end chars like [ ] or < >, or even # % ([ and ] are default). Don't put { } in it or a pipe |, as these are RTF codes or will be replaced.
Here is the main sub. Put a string(builder) in and expect a string(builder) out cleaned up:
Public Function CleanDocument(ByVal rtfSring As String, _
Optional ByVal detectStartChar As Char = CChar("["),_
Optional ByVal detectEndChar As Char = CChar("]")) As String
Dim time As Integer = Date.Now.TimeOfDay.Milliseconds
Dim sb As New StringBuilder(rtfSring)
Dim sbclean As New StringBuilder(sb.ToString)
Dim tempstr(1) As String
Dim stepper As Integer = 0
Do
tempstr = ReturnNextRtfString(sb, detectStartChar, detectEndChar, True)
If tempstr(0) Is Nothing Then Exit Do
sbclean.Replace(tempstr(0), tempstr(1))
ReDim Preserve _ArrayOfFields(1, stepper)
_ArrayOfFields(0, stepper) = tempstr(0)
_ArrayOfFields(1, stepper) = tempstr(1)
stepper += 1
Loop
processedinmillisecconds = Date.Now.TimeOfDay.Milliseconds - time
Return sbclean.ToString
End Function
Now, the helper subs find each substring with the start and end char:
Private Function ReturnNextRtfString(ByRef sb As StringBuilder, _
ByVal startchar As Char, ByVal endchar As Char, _
Optional ByVal autoclean As Boolean = False) As String()
Dim startcounter, endcounter As Integer
Dim acounter As Integer
Dim returnstring(1) As String
For acounter = bcounter To sb.Length - 1
If sb.Chars(acounter) = startchar Then
startcounter = acounter
End If
If sb.Chars(acounter) = endchar Then
endcounter = acounter + 1
bcounter = acounter + 1
End If
If startcounter > 0 AndAlso endcounter > startcounter Then
If autoclean = True Then
returnstring(1) = CleanRtfString(sb.ToString.Substring
(startcounter, endcounter - startcounter))
returnstring(0) = sb.ToString.Substring(startcounter,
endcounter - startcounter)
Return returnstring
Else
returnstring(0) = sb.ToString.Substring(startcounter,
endcounter - startcounter)
Return returnstring
End If
Exit Function
End If
Next
Return returnstring
End Function
And finally, the clean up function:
Private Function CleanRtfString(ByRef rtfstring As String) As String
Dim sb As New StringBuilder(rtfstring)
Dim cleansb As New StringBuilder
Dim ccounter As Integer
For ccounter = 0 To sb.Length
If Asc(sb.Chars(ccounter)) > 32 AndAlso sb.Chars(ccounter) <> "|" _
AndAlso sb.Chars(ccounter) <> "\" AndAlso sb.Chars(ccounter) <> "{" _
AndAlso sb.Chars(ccounter) <> "}" Then
cleansb.Append(sb.Chars(ccounter))
End If
If ccounter + 1 >= sb.Length Then Exit For
If sb.Chars(ccounter + 1) = "\" OrElse sb.Chars(ccounter + 1) = "{"
OrElse sb.Chars(ccounter + 1) = "}" Then
For dcounter As Integer = ccounter + 1 To sb.Length - 1
If sb.Chars(dcounter) = CChar(" ") Then Exit For
sb.Chars(dcounter) = CChar("|")
Next
End If
Next
cleansb.Replace("|", "")
Return cleansb.ToString
End Function
Problems to work on
- When users insert an image in between the tags of the merge field, the cleaner will not be able to clean it up correctly.
- When a Word hyperlink is in between the tags, this will also mangle the output.
- Because the pipe char is used as a replacement, it can not be used in the field.
- Spaces are the only identifiers in RTF, so as a simple solution, I remove them; this could be done more neatly.
History
This is my first article on The Code Project!
- 10-02-2008: The first version.