(untagged)

A Simple JPEG Encoder in C#

Arpan Jati

0.00/5 (No votes)

1 Jun 2010

Baseline JPEG encoding implemented using C#.

Introduction

JPEG is one of the most widely used standards for encoding photographs, pictures, or other visual content. But, the internal workings are mostly overlooked. We are used to using image.Save("filename",Imaging.ImageFormat.Jpeg);, but what's going inside image.save(); remains a big mystery for the programmer.

Most of the available implementations of JPEG Encoder/Decoder are in C/C++. But, for a project, my friend 'anirban' required a JPEG encoder in C#. So, we started writing our own encoder. It turned out to be very complex and difficult. The one we wrote from scratch was really-really slow.. some functions had fourth order complexity (mostly the DCT part). Somehow, we managed to get some 'C' code from a forum. I converted parts of the 'C' code to C#; it was tedious work, but it worked pretty fast due to the fast DCT (AA&N).

The encoder is not enhanced too much for speed, but it works properly.

This article will not explain in detail how this all works, but will give an overview of the workings of JPEG and the wonderful work the JPEG standard people have done. If you want to properly understand the JPEG standard, then visit the official JPEG site or the IJG site.

Screenshots

The program displaying an image with its three (RGB) channels separated.

Clicking on the individual channel picture boxes shows the individual channel images on the main PictureBox, and clicking on the main PictureBox shows the original loaded image.

Write-original saves the original image using the JPEG encoder, whereas clicking on the 'Write Current' button converts the picture being displayed on the main PictureBox to an image pixel array and then saves it. The latter option can be used for saving individual channel images.

The Y channel of the same image.

Background

I am not an expert in JPEG. But, from what I have learnt, the basic steps are...

1. The afine transformation in colour space: [R G B] -> [Y Cb Cr]

(It is defined in the CCIR Recommendation 601.)

The YCC colour space used follows that used by TIFF and JPEG (Rec 601-1):

Y  =  0.2989 R + 0.5866 G + 0.1145 B
Cb = -0.1687 R - 0.3312 G + 0.5000 B
Cr =  0.5000 R - 0.4183 G - 0.0816 B

RGB values are normally on the scale of 0 to 1, or since they are stored as unsigned single bytes, 0 to 255. The resulting luminance value is also on the scale of 0 to 255; the chrominance values need 127.5 added to them so they can be saved in an unsigned byte.

Y or luminance is the intensity of an RGB colour perceived by the eye. The formula for Y is like a weighted-filter with different weights for each spectral component: the eye is most sensitive to the Green component, then it follows the Red component, and last is the Blue component.

The values Cb and Cr are called the chrominance values, and represent two coordinates in a system which measures the nuance and saturation of the colour ([approximately], these values indicate how much blue and how much red is in that colour).

2. Sampling

The JPEG standard takes into account the fact that the eye seems to be more sensitive at the luminance of a colour than at the nuance of that colour. (The white-black view cells have more influence than the day view cells.)

So, on most JPGs, luminance is taken in every pixel while the chrominance is taken as a medium value for a 2x2 block of pixels. Note that it is not necessary that the chrominance be taken as a medium value for a 2x2 block; it could be taken in every pixel, but good compression results are achieved this way, with almost no loss in visual perception of the new sampled image.

3. Level shift

All 8-bit unsigned values (Y, Cb, Cr) in the image are "level shifted": they are converted to an 8-bit signed representation, by subtracting 128 from their value.

4. Block splitting

Each channel must be split into 8X8 blocks of pixels. If the data for a channel does not represent an integer number of blocks, then the encoder must fill the remaining area of the incomplete blocks with some form of dummy data. Filling the edge pixels with a fixed color (typically black) creates ringing artifacts along the visible part of the border; repeating the edge pixels is a common technique that reduces the visible border, but it can still create artifacts. We are currently filling the dummy pixels with zeros (black).

5. The 8x8 Discrete Cosine Transform (DCT)

DCT is then applied to the 8x8 blocks.

The mathematical definition of Forward DCT (FDCT) is:

FDCT:
        c(u,v)     7   7                 2*x+1                2*y+1
F(u,v) = --------- * sum sum f(x,y) * cos (------- *u*PI)* cos (------ *v*PI)
          4       x=0 y=0                 16                   16

 u,v = 0,1,...,7

      { 1/2 when u=v=0
 c(u,v) = { 1/sqrt(2) when  u=0, v!=0
          { 1/sqrt(2) when u!=0, v=0
      {  1 otherwise

The FDCT formula described above is very computationally expensive, so we use a different faster form of FDCT.

6. Quantization

Quantization is the step in which the primary compression takes place; in this step, the full 8x8 vector is divided by the values in the quantization table. As a result, the higher frequencies in the image are removed. The reason for this is that our eye is more sensitive to low frequency details and less sensitive to high frequency details.

Luminance Quantization Table {STANDARD}

16,  11,  10,  16,  24,  40,  51,  61,
12,  12,  14,  19,  26,  58,  60,  55,
14,  13,  16,  24,  40,  57,  69,  56,
14,  17,  22,  29,  51,  87,  80,  62,
18,  22,  37,  56,  68, 109, 103,  77,
24,  35,  55,  64,  81, 104, 113,  92,
49,  64,  78,  87, 103, 121, 120, 101,
72,  92,  95,  98, 112, 100, 103,  99

Similar quantization tables exist for chrominance as well.

An encoder can use a different quantization table, but the same should be specified in the JPEG file so that proper decoding can take place.

The quantization process will result in a lot of zeros in the resultant vector; as a result, the RLC in the next step will cause major decrease in file size.

The quantization tables can be defined manually for testing (comma separated list of 64 values).

Any Quantization Table can be used just by typing in the values.

Effect of quantization

Scaling is done using the following method..

static Byte[] Scale_And_ZigZag_Quantization_Table(Byte[] intable, float quant_scale)
{
    Byte[] outTable = new Byte[64];
    long temp;
    for (Byte i = 0; i < 64; i++)
    {
        temp = ((long)(intable[i] * quant_scale + 50L) / 100L);
        if (temp <= 0L)
            temp = 1L;
        if (temp > 255L)
            temp = 255L;
        outTable[Tables.ZigZag[i]] = (Byte)temp;
    }
    return outTable;
}

Using QT-1 (Quantization Table 1)

Luminance QuantizationTable 1 [JPEG STANDARD]

Using QT=1 , Factor = 50 [28.9 Kb]

Using QT=1 , Factor = 900 [9.46 Kb]

Using QT-2 (Quantization Table 2)

Luminance QuantizationTable 2 [Trial: For lower quality]

Using QT=2 , Factor = 50 [19.3 KB]

Using QT=2 , Factor = 900 [8.26 KB]

So, it is clear that the actual compression in JPEG occurs at the quantization step. This is the only lossy part. The Entropy encoding done later is lossless.

7. Entropy coding

a. Zigzag reordering

The 8x8 block is then traversed in zig-zag manner like this:

|   0, 1, 5, 6,14,15,27,28, |
|   2, 4, 7,13,16,26,29,42, |
|   3, 8,12,17,25,30,41,43, |
|   9,11,18,24,31,40,44,53, |
|  10,19,23,32,39,45,52,54, |
|  20,22,33,38,46,51,55,60, |
|  21,34,37,47,50,56,59,61, |
|  35,36,48,49,57,58,62,63  |

(The numbers in the 8x8 block indicate the order in which we traverse the bi-dimensional 8x8 matrix)

As you see, first is the upper-left corner (0,0), then the value at (0,1), then (1,0), then (2,0), (1,1), (0,2), (0,3), (1,2), (2,1), (3,0) etc.

After we are done with traversing in zig-zag the 8x8 matrix, we have now a vector with 64 coefficients (0..63). The reason for this zig-zag traversing is that we traverse the 8x8 DCT coefficients in the order of increasing spatial frequencies.

b. Run length coding

Run-length encoding (RLE) is a very simple form of compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs; here zeros.

Example data: 20,17,0,0,0,0,11,0,-10,-5,0,0,1,0,0,0, 0 , 0 ,0 , only 0,..,0

RLC for JPEG compression: (0,20) ; (0,17) ; (4,11) ; (1,-10) ; (0,-5) ; (2,1) ; EOB

Format: (Zeros preceding the number, Number)

In the Huffman coding in the next step, things are coded in 4 bits, so there is a restriction on the number of continuous zeros, to prevent the value 15 (0xF) getting over passed.

So, after 15 zeros, we use (15,0) which indicates there are 16 consecutive zeros.

c. Huffman coding

This is a complex procedure. I will explain the procedure later.

Using the code

1. An instance of the encoder can be created by:

BaseJPEGEncoder encoder = new BaseJPEGEncoder();

2. Then any of the two functions should be called.

1. EncodeImageBufferToJpg()

public void EncodeImageBufferToJpg(Byte[, ,] ImageBuffer, 
            Point originalDimension, Point actualDimension, 
            BinaryWriter OutputStream, float Quantizer_Quality, 
            byte[] luminance_table, byte[] chromiance_table, 
            Utils.IProgress progress, 
            Utils.ICurrentOperation currentOperation)

2. EncodeImageToJpg()

public void EncodeImageToJpg(Image ImageToBeEncoded, 
       BinaryWriter OutputStream, 
       float Quantizer_Quality, byte[] luminance_table, 
       byte[] chromiance_table, Utils.IProgress progress, 
       Utils.ICurrentOperation currentOperation)

The image can be in the form of a buffer (pixel array) which can be obtained from a bitmap by calling Fill_Image_Buffer().

byte[,,] Fill_Image_Buffer(Bitmap bmp, IProgress progress,ICurrentOperation operation);

The array is defined as Byte [Width, Height, 3]. The third index is the color 'Red = 0', 'Blue = 1', and 'Green = 2'.

Full example:

Utils.ProgressUpdater progressObj = new Utils.ProgressUpdater();
Utils.CurrentOperationUpdater currentOperationObj = new Utils.CurrentOperationUpdater();

Bitmap bmp = new Bitmap("C:\\source.bmp");
byte [,,] image_array = Utils.Fill_Image_Buffer(bmp, progressObj, currentOperationObj);

Point originalDimension = new Point(bmp.Width, bmp.Height);
Point actualDimension = Utils.GetActualDimension(originalDimension);            

FileStream fs = new FileStream("C:\\dest.jpg", FileMode.Create, 
    FileAccess.Write, FileShare.None);
BinaryWriter bw = new BinaryWriter(fs);

JpegEncoder.BaseJPEGEncoder encoder = new BaseJPEGEncoder();

encoder.EncodeImageBufferToJpg(image_array, originalDimension, actualDimension,
    bw, float.Parse("50"), // Lower quality value better Image
    Tables.std_luminance_qt, Tables.std_chrominance_qt,
    progressObj, currentOperationObj);

Other details

I used Interfaces for progress update, so initialize the objects as shown, before calling the encode function.

Utils.ProgressUpdater progressObj = new Utils.ProgressUpdater();
Utils.CurrentOperationUpdater currentOperationObj = new Utils.CurrentOperationUpdater();

If somebody has a better way for progress reporting, I would like to know.

Points of interest

Fill_Image_Buffer() and Write_Bmp_From_Data() in Utils.cs uses interop "gdi32.dll" to fill the image buffers fast enough. Earlier, I used GetPixel() and SetPixel(), and it was very slow. So, I had to change the functions.

I spent two days to figure out how the GetDIBits() and SetDIBits() function, by writing a C++ .NET program and then converting it to C#. InteropGDI.cs contains a lot of unused functions, but they all work as I had written a program which uses them all.

History

This is the first public version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here