Introduction
This is an implementation of the yEnc algorithm, as described at http://www.yenc.org/ . yEnc is not an official standard, but it is nonetheless a very popular encoding method on binary newsgroups. As algorithms go, yEnc is very simple. It uses 8-bit characters to encode binary data. Since binary data is usually stored as 8-bit bytes, it does not have to accomplish too much :)
yEnc's popularity is due to the fact that it uses a full byte to encode the data, whereas other methods use only 7-bits. This makes messages encoded with yEnc smaller by a factor of 33-40%, according to the website. Smaller means quicker to upload and download, which is important when dealing with large binary files. It has additional benefits as well, in the form of an optional CRC32 check.
My implementation is interesting from 2 points of view:
- It is the only open-source one written in C#
- It is implemented as a cryptographic transform - more on that later
Some Info on yEnc
There are some peculiarities of newsgroup messages:
- messages must be broken into lines, max around 1000 characters
- some characters have meaning, and as such need to be escaped out
The current yEnc algorithm escapes out CR, LF and the NULL character by default. However, individual encoders are free to escape other characters as they wish. Lines are broken at 128 characters, or 256 characters, by convention. Other line lengths are supported.
yEnc data begins with a =ybegin tag at the start of a line. The tag has additional attributes that specify the number of bytes to expect, as well as the name of the file and the length of the lines. Multipart messages are supported. The data ends with a line starting with =yend. The reason for the "=y" is that, due to the nature of the algorithm, it could never occur naturally as part of the data.
Using the code
My implementation of the algorithm deals purely with encoding and decoding the data, not parsing of messages, or even parsing of the yEnc headers and footers. To me, that is a separate challenge, which I'll leave to someone else.
Initially, I started coding the encoder as an implementation of System.Text.Encoder
. However, I soon realized that, although I could read the data as text, I was really dealing with bytes. Probably, that should have been obvious to me from the beginning, but sometimes it takes a while :( Eventually, I decided it would work best as an implementation of ICryptoTransform
. This is not to imply that it is a cryptographic algorithm, just that it transforms data in similar ways - the size of the input data does not necessarily match the size of the output data. Microsoft chose to implement the Base64 transformation objects in a similar way.
The benefit is that you can use the objects together with a CryptoStream
object, which is a fairly easy interface to use, and automatically adds support for streams. I'll stress again though, that this is not an encryption technique - I am just making use of existing Framework objects and interfaces to add power to my objects.
To encode some yEnc data, your code might look like this:
MemoryStream ms = new MemoryStream();
YEncEncoder encoder = new YEncEncoder();
CryptoStream cs = new CryptoStream(ms, encoder, CryptoStreamMode.Write);
StreamWriter w = new StreamWriter(cs);
w.Write("Test string");
w.Flush();
cs.Flush();
To decode it again, the code might continue:
ms.Position = 0;
YEncDecoder decoder = new YEncDecoder();
CryptoStream cs2 = new CryptoStream(ms, decoder, CryptoStreamMode.Read);
StreamReader r = new StreamReader(cs2);
string finalText = r.ReadToEnd();
This is pretty standard code that you might write if you were encrypting your data, the only difference being that we are using the yEncEncoder
and yEncDecoder
instead of a system-supplied encryption algorithm.
Points of Interest
I have made use of Phil Bolduc's implementation of the CRC32 algorithm, found at http://www.codeproject.com/csharp/crc32_dotnet.asp . Unfortunately, there were some bugs in that that consumed a significant amount of my time. I had to make some modifications to make it work 100%. Other than that, the code is an original work of my own, not based off of any other implementations. You are free to use it for whatever purpose you may desire, as long as you attribute it to me in the code comments.
The dowloadable code includes a lot of NUnit tests, which test things to a point where I am comfortable that everything works. They should make the code easy to expand on for anyone who wants to add functionality.