Introduction
As part of my research project, I had to implement a feature tracking device that runs entirely on a hardware board. Designing things, especially useful things on a piece of hardware, takes effort and time. To avoid any tedious calibrations of algorithms on board and to ensure the algorithms are all properly designed, I wrote a Windows application to simulate the environment - grabbing frames from a web camera and track. In exactly the same way as I have benefited from open source projects, I would certainly enjoy spending some time contributing to a site such as The Code Project.
AVICap
In this demo application, I have chosen to demonstrate the use of AVICap
window class to track objects. AVICap
is a window class that provides applications with an extremely convenient programming interface to access video acquisition hardware such as a web camera used in this demo application.
To be able to track objects from a live video input, we obviously need to gain access to individual frames. To gain access to individual frames before they are previewed, use the capSetCallbackOnFrame
macro.
BOOL capSetCallbackOnFrame(HWND hwnd, FrameCallback fpProc);
HWND hwnd
: Handle to the capture window.
FrameCallback fpProc
: Pointer to the preview callback function. Specify NULL
for this parameter to disable a previously installed callback function.
typedef LRESULT (*FrameCallback)(HWND hWnd, LPVIDEOHDR lpVideoHdr);
The LPVIDEOHDR
is declared as follows:
typedef struct videohdr_tag {
LPBYTE lpData;
DWORD dwBufferLength;
DWORD dwBytesUsed;
DWORD dwTimeCaptured;
DWORD dwUser;
DWORD dwFlags;
DWORD dwReserved[4];
}
#define VHDR_DONE 0x00000001
#define VHDR_PREPARED 0x00000002
#define VHDR_INQUEUE 0x00000004
#define VHDR_KEYFRAME 0x00000008
Once the frame callback procedure is associated to a capture window, we are all set to begin tracking.
Color Space
Before we start processing frames, it is important to understand the different representations for color spaces used in digitized video. There are many color spaces to choose from, and each of them has its own strengths and limitations. Choosing the right color space for a specific application simplifies computation significantly.
The feature that we will be looking at for this demo application is brightness, and we will track objects based on their brightness. A very natural approach is to make sure that the color space that we are dealing with has a brightness component. YUV is one color space that has this very component that we are seeking for. However, YUV is not necessarily one of the input formats that is available from the web camera. Therefore, a conversion is required from the typical RGB24 input format to YUV.
The relationship between RGB and YUV can be expressed simply as the following set of linear equations.
[ Y ] [ 0.257 0.504 0.098 0.063 ][ R ]
[ U ] = [ -0.148 -0.291 0.439 0.500 ][ G ]
[ V ] [ 0.439 -0.368 -0.072 0.500 ][ B ]
[ 1 ] [ 0.000 0.000 0.000 1.000 ][ 1 ]
This matrix results from the concept of change of basis in linear algebra, where in this case, corresponds to the rotation of the color cube such that the new basis has a component with the unique property R = G = B.
Feature Tracking
Now that we have direct access to the brightness of each pixel, a simple algorithm can be used to track a bright object. The algorithm that will be introduced here is a fairly simple one, called the "rectangle algorithm". The rectangle algorithm keeps track of four points in each frame, the top most, left most, right most and bottom most points where the brightness exceeds a certain threshold value.
If you use the following code, make sure you set the input format of your web camera to RGB24.
LRESULT CChildView::FrameCallbackProc(HWND hWnd, LPVIDEOHDR lpVideoHdr)
{
...
...
for (int i=0; i<nHeight; ++i) {
for (int j=0; j<nWidth; ++j) {
index = 3*(i*nWidth+j);
Y = floor(0.299*lpData[index+2] + 0.587*lpData[index+1] +
0.114*lpData[index] + 0.5);
if (Y > bThreshold) {
if (init) {
if (pLeft.x > j) {
pLeft.x = j;
pLeft.y = i;
}
if (pRight.x < j) {
pRight.x = j;
pRight.y = i;
}
pBottom.x = j;
pBottom.y = i;
}
else {
pTop.x = pBottom.x = pLeft.x = pRight.x = j;
pTop.y = pBottom.y = pLeft.y = pRight.y = i;
init = true;
}
}
}
}
...
...
}
A rectangle can be constructed from these points, which tells us where the bright object is. The border of the rectangle is then simply replaced by a predefined color.
if (init) {
for (int i=pLeft.x; i<=pRight.x; ++i) {
index = 3*((pTop.y)*nWidth + i);
lpData[index] = 0;
lpData[index+1] = 0;
lpData[index+2] = 255;
index = 3*((pBottom.y)*nWidth + i);
lpData[index] = 0;
lpData[index+1] = 0;
lpData[index+2] = 255;
}
for (int i=pTop.y; i<=pBottom.y; ++i) {
index = 3*((i)*nWidth + pLeft.x);
lpData[index] = 0;
lpData[index+1] = 0;
lpData[index+2] = 255;
index = 3*((i)*nWidth + pRight.x);
lpData[index] = 0;
lpData[index+1] = 0;
lpData[index+2] = 255;
}
}
This algorithm obviously has a lot of weaknesses.
- It only gives the position of the object as a whole on the screen.
- It does not keep any information about the shape of the object.
- It does not tell where the middle of the object is.
- It can never track multiple objects.
An Improved Algorithm
This algorithm tracks objects by identifying segments that make up the object on the screen. Each segment consists of the head and the length of the segment. The object is constructed by grouping the segments together.
BYTE Y; int index;
QSEG segment;
std::list<QSEG> object;
for (int i=0; i<nHeight; ++i) {
segment.length = 0;
for (int j=0; j<nWidth; ++j) {
index = 3*(i*nWidth+j);
Y = floor(0.299*lpData[index+2] + 0.587*lpData[index+1] +
0.114*lpData[index] + 0.5);
if (Y > bThreshold) {
if (segment.length == 0) {
segment.head.x = j;
segment.head.y = i;
}
++segment.length;
}
}
if (segment.length) {
object.push_back(segment);
}
}
for (std::list<QSEG>::iterator i=object.begin(); i!=object.end(); ++i) {
index = 3*((*i).head.y*nWidth + (*i).head.x);
lpData[index] = 255;
lpData[index+1] = 0;
lpData[index+2] = 255;
index = 3*((*i).head.y*nWidth + (*i).head.x + (*i).length);
lpData[index] = 255;
lpData[index+1] = 0;
lpData[index+2] = 255;
}
This new tracking algorithm has a few extra advantages.
- It can track multiple objects.
- It can track the shape of the objects.
- The number of pixels that make up the object on the screen can be easily calculated. With this piece of information and proper distance calibration, the position of the object in 3 dimensions can be determined.