Introduction
This article presents a class library for encoding/decoding files and/or text
in several algorithms in .NET. Some of the features of this library:
- Encoding/decoding text in Quoted Printable
- Encoding/decoding files and text in Base64
- Encoding/decoding files and text in UUEncode
- Encoding/decoding files in yEnc
Using The Code
Remember to add a reference in your project to TextCoDec.dll. Once you
add the reference, Visual Studio .NET should take care of the copying for
you.
Dim Yenc As New TextCodec.Yenc
Dim Parts() As String
Parts = Yenc.Encode("C:\WINDOWS\system32\calc.exe", 80, 2, _
TextCodec.Yenc.yencVersion.Version1_1)
Yenc.Decode(Parts, "c:\", 0)
'test it
Shell("c:\calc.exe")
The Algorithms
Base64
The base64 encoding/decoding was the easiest one to implement: it is part of
the framework (System.Convert.ToBase64String()
and
System.Convert.FromBase64String()
). Take a look at the
documentation.
Quoted Printable
This algorithm is essentially used to encode non English text. Characters
codes outside the range 32 to 126 are transformed to their ASCII hex value
preceded by an equal sign, the exception being the character code 61 (the equal
sign) which must also be encoded.
For i = 0 To Chars.Length - 1
Ascii = Asc(Chars(i))
If Ascii < 32 Or Ascii = 61 Or Ascii > 126 Then
EncodedChar = Hex(Ascii).ToUpper
If EncodedChar.Length = 1 Then EncodedChar = "0" & EncodedChar
ReturnString.Append("=" & EncodedChar)
Else
ReturnString.Append(Chars(i))
End If
Next
UUEncode
The best algorithm definition I found is the following, taken from here.
The uuencode algorithm hinges around a 3-byte-to-4-byte (8-bit to
6-bit data) encoding to convert all data to printable characters. To perform
this encoding read in 3 bytes from the file to be encoded whose binary
representation is
a7a6a5a4a3a2a1a0 b7b6b5b4b3b2b1b0 c7c6c5c4c3c2c1c0
and
convert them into 4 bytes with values in the range 0-63 as follows:
0 0
a7a6a5a4a3a2 0 0 a1a0b7b6b5b4 0 0 b3b2b1b0c7c6 0 0 c5c4c3c2c1c0
then convert
these bytes to printable characters by adding 0x20 (32).
exception: if you
end up with a zero byte it should be converted to 0x60 (back-quote '`') rather
than 0x20 (space ' ').
In addition, the start of the encoding is marked
by the line "start ", where consists of 3 octal digits
which are the UNIX mode of the file, and is the original filename of
the file encoded. The end of the encoding is marked by the line "end". The first
character of each line contains the line length in bytes *in the original file*,
encoded in the same way as an ordinary byte i.e. line length 0->0x60, all
other lengths add 0x20 to convert to printable characters. Line lengths vary
from 0 to 45 (which encodes to 'm'; this is why lines in a uuencoded file all
start with an m), which is a line length of 61 characters (including the length
character) in the encoded file. This is a nice safe length to transmit via
email.
Lines in the encoded file are always a multiple of 4 + 1
characters long; this sometimes means that 1 or 2 bytes are thrown away at the
end of the decoding.
The main encoding is achieved in VB by the following code:
For i = 0 To Chars.Length - 1 Step 3
DecodedBytes(0) = Asc(Chars(i))
DecodedBytes(1) = Asc(Chars(i + 1))
DecodedBytes(2) = Asc(Chars(i + 2))
EncodedBytes(0) = (DecodedBytes(0) \ 4 + 32)
EncodedBytes(1) = ((DecodedBytes(0) Mod 4) * 16) + _
(DecodedBytes(1) \ 16 + 32)
EncodedBytes(2) = ((DecodedBytes(1) Mod 16) * 4) + _
(DecodedBytes(2) \ 64 + 32)
EncodedBytes(3) = (DecodedBytes(2) Mod 64) + 32
If (EncodedBytes(0) = 32) Then EncodedBytes(0) = 96
If (EncodedBytes(1) = 32) Then EncodedBytes(1) = 96
If (EncodedBytes(2) = 32) Then EncodedBytes(2) = 96
If (EncodedBytes(3) = 32) Then EncodedBytes(3) = 96
ReturnString.Append(Chr(EncodedBytes(0)))
ReturnString.Append(Chr(EncodedBytes(1)))
ReturnString.Append(Chr(EncodedBytes(2)))
ReturnString.Append(Chr(EncodedBytes(3)))
Next
Yenc
In essence, the yenc algorithm can be implemented by the following
expressions:
EncodedCharacter = (Character + 42) Mod 256
EncodedSpecialCharacter = (EncodedCharacter + 64) Mod 256
There are, as always, some characters which make up the exceptions. Those are
null (0), line feed (LF), carriage return (CR) and the equal sign (=). The tab
character was also an exception but was removed in version 1.2. If the encoded
character is one of the afore mentioned, re-encode it with the
EncodedSpecialCharacter expression and escape it with the equal sign.
The yenc algorithm is flexible, however. If, for some reason a character
isn�t suitable in the encoded stream, escape it as you would a special
character. This is especially useful for nntp transmission. With the latter
protocol, a double dot (..) signifies the end of stream. However, the dot
character isn�t by default a special yenc character so you could end up with a
line which starts with a double dot. This would confuse some newsreaders; a good
principle is to always escape a dot if it is located at the beginning of the
line.
There is another exception dealing with the line length. The choice of line
length is flexible, but it�s length is also variable in the way that you can�t
end a line with the escape character. If the last character to be encoded turns
out to be a special character you escape it normally and end up with two
characters (the escape charater and the encoded one), thus with a line length of
length+1.
For more information on yenc go to www.yenc.org
The main encoding is achieved in VB by the following code:
For i = 0 To n - 1
CharCode = (Bytes(i) + 42) Mod 256
Select Case CharCode
Case 0, 13, 10, 61
OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
Case Else
If Version = yencVersion.Version1_1 And CharCode = 9 Then
OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
Else
OutputLine &= Chr(CharCode)
End If
End Select
If OutputLine.Length >= LineLength Then
Output.Append(OutputLine & vbCrLf)
OutputLine = ""
End If
Next
Streams
As I was rewriting the code from scratch, I was amazed at how streams made my
life easier. Not only that, but the code also got a speed boost that is almost
unbelievable (about 11400% actually).
So why easier, you may ask. Well, almost anything can be turned into a
stream. Take a look at the following examples:
Dim MyPath As String
Dim MyByteArray() As Byte
Dim MyString As String
Dim MyStream As New Filestream(MyPath)
Dim MyStream As New MemoryStream(MyByteArray)
Dim MyStream As New Memorystream(System.Text.Encoding.Default.GetBytes (MyString))
As you can see, streams are very versatile. That took care of almost all
overloads!
A stream is also endless, so that sidestepped the problem of decoding
multipart yenc files. Because data can be written anywhere on a stream, I didn't
have to sort the parts to write them sequentially. I opened a stream, positioned
it at the offset of the part (parsed from the part header) and just dumped the
decoded data into it.
Other Optimizations
One other object of the .NET Framework allowed the amazing speed increase:
the StringBuilder object. If you have to concatenate large strings, I strongly
recommend using this object.
In some measurements I made, string concatenation is 250 times faster with
this object.
It is ideal for this project, as an enormous part of the encoding/decoding
process involves string concatenation.
A Few Words Of Advice
If speed is more important than presentation, don't declare the
encoder/decoder with events. If you handle the progress event, the decoding will
be noticeably slower.
When encoding large files (larger than 10MB), don�t encode them to memory.
Use the overloads that encode to a file. Also, don�t rely too much on the
garbage collector. Always destroy variables; it�s always a good principle.
If you�re trying to time the encoding/decoding process, be sure to disable
any anti-virus. The reason for that is that the written file streams won�t close
until the AV has finished checking the file for viruses. For small files this
may not affect the results in a noticeable way, but for large files it can
really make a mess of things�
Finally, if you�re encoding really large files and you�re fortunate enough to
have two hard disks, make certain that you read from the slower one and write to
the faster one. HDD writing is always slower than reading, and a HDD can�t read
and write at the same time (so if you read and write to the same HDD, it�ll
position it�s head, read a chunk of data, reposition the head, write another
chunk, and so on).
Credits
Documentation was compiled to XML by VB.Doc, a free documentation system for the VB.NET programming
language. The help file was generated by NDoc.
History
- 16/03/2004 - Initial release.
- 04/06/2004 - Total rewrite of the code. In some cases there is an increase
in speed by 11400%!
Feedback and Improvements
Feel free to post questions, enhancements or problems to the forum below.
I'll keep an eye on them and help where possible. If you have an enhancement or
optimization, post it so that everyone may benefit from them. I'll review them
and add them to the project with due credit.