(untagged)

Binary to Text Encode/Decode Class

Frederico Daupias de Alcochete

0.00/5 (No votes)

9 Jun 2004

A .NET class Library to handle the encoding/decoding of QuotedPrintable, UUEncode, Base64 and Yenc algorithms.

Sample Image - TextCoDec1.jpg

Sample screenshot

Introduction

This article presents a class library for encoding/decoding files and/or text in several algorithms in .NET. Some of the features of this library:

Encoding/decoding text in Quoted Printable
Encoding/decoding files and text in Base64
Encoding/decoding files and text in UUEncode
Encoding/decoding files in yEnc

Using The Code

Remember to add a reference in your project to TextCoDec.dll. Once you add the reference, Visual Studio .NET should take care of the copying for you.

Dim Yenc As New TextCodec.Yenc
Dim Parts() As String
'encode the windows calculator in 3 parts with a line length of 80

Parts = Yenc.Encode("C:\WINDOWS\system32\calc.exe", 80, 2, _
    TextCodec.Yenc.yencVersion.Version1_1)
'decode to c:\

Yenc.Decode(Parts, "c:\", 0)
'test it
Shell("c:\calc.exe")

The Algorithms

Base64

The base64 encoding/decoding was the easiest one to implement: it is part of the framework (System.Convert.ToBase64String() and System.Convert.FromBase64String()). Take a look at the documentation.

Quoted Printable

This algorithm is essentially used to encode non English text. Characters codes outside the range 32 to 126 are transformed to their ASCII hex value preceded by an equal sign, the exception being the character code 61 (the equal sign) which must also be encoded.

For i = 0 To Chars.Length - 1
    Ascii = Asc(Chars(i))
    If Ascii < 32 Or Ascii = 61 Or Ascii > 126 Then
        EncodedChar = Hex(Ascii).ToUpper
        If EncodedChar.Length = 1 Then EncodedChar = "0" & EncodedChar
        ReturnString.Append("=" & EncodedChar)
    Else
        ReturnString.Append(Chars(i))
    End If
Next

UUEncode

The best algorithm definition I found is the following, taken from here.

The uuencode algorithm hinges around a 3-byte-to-4-byte (8-bit to 6-bit data) encoding to convert all data to printable characters. To perform this encoding read in 3 bytes from the file to be encoded whose binary representation is
a7a6a5a4a3a2a1a0 b7b6b5b4b3b2b1b0 c7c6c5c4c3c2c1c0
and convert them into 4 bytes with values in the range 0-63 as follows:
0 0 a7a6a5a4a3a2 0 0 a1a0b7b6b5b4 0 0 b3b2b1b0c7c6 0 0 c5c4c3c2c1c0
then convert these bytes to printable characters by adding 0x20 (32).
exception: if you end up with a zero byte it should be converted to 0x60 (back-quote '`') rather than 0x20 (space ' ').

In addition, the start of the encoding is marked by the line "start ", where consists of 3 octal digits which are the UNIX mode of the file, and is the original filename of the file encoded. The end of the encoding is marked by the line "end". The first character of each line contains the line length in bytes *in the original file*, encoded in the same way as an ordinary byte i.e. line length 0->0x60, all other lengths add 0x20 to convert to printable characters. Line lengths vary from 0 to 45 (which encodes to 'm'; this is why lines in a uuencoded file all start with an m), which is a line length of 61 characters (including the length character) in the encoded file. This is a nice safe length to transmit via email.

Lines in the encoded file are always a multiple of 4 + 1 characters long; this sometimes means that 1 or 2 bytes are thrown away at the end of the decoding.

The main encoding is achieved in VB by the following code:

For i = 0 To Chars.Length - 1 Step 3
        DecodedBytes(0) = Asc(Chars(i))
        DecodedBytes(1) = Asc(Chars(i + 1))
        DecodedBytes(2) = Asc(Chars(i + 2))

        EncodedBytes(0) = (DecodedBytes(0) \ 4 + 32)
        EncodedBytes(1) = ((DecodedBytes(0) Mod 4) * 16) + _
          (DecodedBytes(1) \ 16 + 32)
        EncodedBytes(2) = ((DecodedBytes(1) Mod 16) * 4) + _
          (DecodedBytes(2) \ 64 + 32)
        EncodedBytes(3) = (DecodedBytes(2) Mod 64) + 32

        If (EncodedBytes(0) = 32) Then EncodedBytes(0) = 96
        If (EncodedBytes(1) = 32) Then EncodedBytes(1) = 96
        If (EncodedBytes(2) = 32) Then EncodedBytes(2) = 96
        If (EncodedBytes(3) = 32) Then EncodedBytes(3) = 96

        ReturnString.Append(Chr(EncodedBytes(0)))
        ReturnString.Append(Chr(EncodedBytes(1)))
        ReturnString.Append(Chr(EncodedBytes(2)))
        ReturnString.Append(Chr(EncodedBytes(3)))
Next

Yenc

In essence, the yenc algorithm can be implemented by the following expressions:

EncodedCharacter = (Character + 42) Mod 256

EncodedSpecialCharacter = (EncodedCharacter + 64) Mod 256

There are, as always, some characters which make up the exceptions. Those are null (0), line feed (LF), carriage return (CR) and the equal sign (=). The tab character was also an exception but was removed in version 1.2. If the encoded character is one of the afore mentioned, re-encode it with the EncodedSpecialCharacter expression and escape it with the equal sign.

The yenc algorithm is flexible, however. If, for some reason a character isn�t suitable in the encoded stream, escape it as you would a special character. This is especially useful for nntp transmission. With the latter protocol, a double dot (..) signifies the end of stream. However, the dot character isn�t by default a special yenc character so you could end up with a line which starts with a double dot. This would confuse some newsreaders; a good principle is to always escape a dot if it is located at the beginning of the line.

There is another exception dealing with the line length. The choice of line length is flexible, but it�s length is also variable in the way that you can�t end a line with the escape character. If the last character to be encoded turns out to be a special character you escape it normally and end up with two characters (the escape charater and the encoded one), thus with a line length of length+1.

For more information on yenc go to www.yenc.org

The main encoding is achieved in VB by the following code:

For i = 0 To n - 1
    CharCode = (Bytes(i) + 42) Mod 256
    Select Case CharCode
        Case 0, 13, 10, 61
            OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
        Case Else
            If Version = yencVersion.Version1_1 And CharCode = 9 Then
                OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
            Else
                OutputLine &= Chr(CharCode)
            End If
    End Select
    If OutputLine.Length >= LineLength Then
        Output.Append(OutputLine & vbCrLf)
        OutputLine = ""
    End If
Next

Streams

As I was rewriting the code from scratch, I was amazed at how streams made my life easier. Not only that, but the code also got a speed boost that is almost unbelievable (about 11400% actually).

So why easier, you may ask. Well, almost anything can be turned into a stream. Take a look at the following examples:

Dim MyPath As String
Dim MyByteArray() As Byte
Dim MyString As String

Dim MyStream As New Filestream(MyPath)
Dim MyStream As New MemoryStream(MyByteArray)
Dim MyStream As New Memorystream(System.Text.Encoding.Default.GetBytes (MyString))

As you can see, streams are very versatile. That took care of almost all overloads!

A stream is also endless, so that sidestepped the problem of decoding multipart yenc files. Because data can be written anywhere on a stream, I didn't have to sort the parts to write them sequentially. I opened a stream, positioned it at the offset of the part (parsed from the part header) and just dumped the decoded data into it.

Other Optimizations

One other object of the .NET Framework allowed the amazing speed increase: the StringBuilder object. If you have to concatenate large strings, I strongly recommend using this object.

In some measurements I made, string concatenation is 250 times faster with this object.

It is ideal for this project, as an enormous part of the encoding/decoding process involves string concatenation.

A Few Words Of Advice

If speed is more important than presentation, don't declare the encoder/decoder with events. If you handle the progress event, the decoding will be noticeably slower.

When encoding large files (larger than 10MB), don�t encode them to memory. Use the overloads that encode to a file. Also, don�t rely too much on the garbage collector. Always destroy variables; it�s always a good principle.

If you�re trying to time the encoding/decoding process, be sure to disable any anti-virus. The reason for that is that the written file streams won�t close until the AV has finished checking the file for viruses. For small files this may not affect the results in a noticeable way, but for large files it can really make a mess of things�

Finally, if you�re encoding really large files and you�re fortunate enough to have two hard disks, make certain that you read from the slower one and write to the faster one. HDD writing is always slower than reading, and a HDD can�t read and write at the same time (so if you read and write to the same HDD, it�ll position it�s head, read a chunk of data, reposition the head, write another chunk, and so on).

Credits

Documentation was compiled to XML by VB.Doc, a free documentation system for the VB.NET programming language. The help file was generated by NDoc.

History

16/03/2004 - Initial release.
04/06/2004 - Total rewrite of the code. In some cases there is an increase in speed by 11400%!

Feedback and Improvements

Feel free to post questions, enhancements or problems to the forum below. I'll keep an eye on them and help where possible. If you have an enhancement or optimization, post it so that everyone may benefit from them. I'll review them and add them to the project with due credit.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here