Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / regular-expression

Decode quoted-printable data by Regex

4.50/5 (4 votes)
14 Feb 2012CPOL 22K  
A single-liner to decode quoted-printable data.

Overview


Some data like MHTML[^] contain parts that are encoded as quoted-printable[^] data stream. That format is quite simple:



  • All printable ASCII characters may be represented by themselves, except the equal sign
  • Space and tab may remain as plain text unless they appear at the end of a line
  • All other bytes are represented by an equal sign followed by two hex digits representing the byte value
  • No line must be longer than 76 characters: if they were longer, they are broken by a trailing equal sign

Example


The following quoted-printable encoded text...


This is a long text with some line break and some encoding of the equal sig=
n (=3D). Any line longer than 76 characters are broken up into lines of 76 =
characters with a trailing equal sign.

...results in the following after decoding...


This is a long text with some line break and some encoding of the equal sign (=). 
  Any line longer than 76 characters are broken up into lines of 76 characters with a trailing equal sign.

The Trick


I came up with the following Regex since I could not find a suitable class in the .NET framework to decode quoted-printable data.


C#
string raw = ...;
string txt = Regex.Replace(raw, @"=([0-9a-fA-F]{2})|=\r\n",
              m => m.Groups[1].Success
                   ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
                   : "");

Where to go from here


Once you have the decoded text, you can for example strip off all HTML tags, e.g.:


C#
string textonly = HttpUtility.HtmlDecode(Regex.Replace(txt, @"<[\S\s]*?>", ""));
Console.WriteLine("{0}", textonly);

Input:


<a href=""#print_link"">Expression&lt;Action&lt;T&gt;&gt; expr = s =&gt; Console.WriteLine(&quot;{0}&quot;, s);

Output:


Expression<Action<T>> expr = s => Console.WriteLine("{0}", s);

Finally, the plain text can be searched for some pattern, e.g.:


C#
var q = from m in Regex.Matches(textonly,
               @"Expression\s*<\s*Action\s*<\s*\w+\s*>\s*>\s*(\w+)\s*=")
               .Cast<Match>()
        select m.Groups[1].Value;
q.Aggregate(0, (n, v) => { Console.WriteLine("{0}: Expression<Action<T>> {1}", ++n, v); return n; });

Possible output:


1: Expression<Action<T>> calculate
2: Expression<Action<T>> print
3: Expression<Action<T>> store

Summary


Performance may not be optimal, but it keeps me going with my other tasks... ;-)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)