Uncategorized

Introduction

The idea behind this blog article is to provide some background info on a subtopic in the context of JPEG compression. From the title you can already guess that the subtopic is about the DCT phase during the JPEG compression. Some time ago I was working on a project where input, coming from a videosource containing encoded image data, had to be decoded to retrieve the full image. The encoding used was some variation of the JPEG compression routine. I had at my disposition the C code that performed the decoding, but in stead of wrapping the existing C code in a library and rely on ‘interop’ I decided it would be interesting to write the decoder in C#.

When doing some research on JPEG compression I frequently encountered this picture:

 

DCT_basis

Some references:

The rest of this article tries to explain the rationale behind this picture and what it stands for in the context of JPEG compression.

Before taking off I want to stress that these kind of routines are not my specialty, I do not have a formal graphics or mathematical background and everything described here is based on my findings and perceptions after doing some research. So occasionally you may find that I am not using the correct terminology or that I am short-circuiting some details, please feel free to bring this to my attention.

A tiny amount of background info

The basic concept of compression routines like JPEG  is that it allows to drop information on an image in such a way that it is not or practically not perceivable by the viewer. In most cases the JPEG routine divides the image in small blocks, each of these blocks is then put through the compression engine. Some  articles that describes this process nicely are:

One of the steps in this compression routine is the DCT. DCT stands for discrete cosine transform. So this step is just a transform step, it is preparing the data so that it can be compressed at a later stage (quantization step)

That small block of data that will be put through the compression engine is actually an 8×8 matrix. This matrix contains 64 data points (for the sake of simplicity we consider that the image is a gray scale image and that the 8×8 pixel matrix only contains one channel, an RGB image would contain contain three channels if not considering the alpha channel)

If we assume that each pixel only needs one byte to encode the value we see that one small data block needs 64 bytes. So compression has to be able to bring this number down.

Let us take the example used in the Wikipedia article:


left[
begin{array}{rrrrrrrr}
 52 & 55 & 61 &  66 &  70 &  61 & 64 & 73 \
 63 & 59 & 55 &  90 & 109 &  85 & 69 & 72 \
 62 & 59 & 68 & 113 & 144 & 104 & 66 & 73 \
 63 & 58 & 71 & 122 & 154 & 106 & 70 & 69 \
 67 & 61 & 68 & 104 & 126 &  88 & 68 & 70 \
 79 & 65 & 60 &  70 &  77 &  68 & 58 & 75 \
 85 & 71 & 64 &  59 &  55 &  61 & 65 & 83 \
 87 & 79 & 69 &  68 &  65 &  76 & 78 & 94
end{array}
right]

The data points in the matrix have values between 0 and 255 which can be translated in a gray scale value (seen besides the matrix). Since the transformation uses the cosine function with values between –1 and 1 (values centered around zero) we also want to center the data around zero so we subtract 128 from that data values, now the range is between –128 and 128.

This is the resulting matrix


begin{array}{c}
x \
longrightarrow \
left[
begin{array}{rrrrrrrr}
 -76 & -73 & -67 & -62 & -58 & -67 & -64 & -55 \
 -65 & -69 & -73 & -38 & -19 & -43 & -59 & -56 \
 -66 & -69 & -60 & -15 & 16 & -24 & -62 & -55 \
 -65 & -70 & -57 & -6 & 26 & -22 & -58 & -59 \
 -61 & -67 & -60 & -24 & -2 & -40 & -60 & -58 \
 -49 & -63 & -68 & -58 & -51 & -60 & -70 & -53 \
 -43 & -57 & -64 & -69 & -73 & -67 & -63 & -45 \
 -41 & -49 & -59 & -60 & -63 & -52 & -50 & -34
end{array}
right]
end{array}
Biggdownarrow y

This is the matrix on which we will apply the DCT, this is the subject of the next part in this series of articles.