Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / multimedia / GDI+

Visual Surveillance Laboratory Part 2

4.80/5 (14 votes)
7 May 20076 min read 1  
A basic description of a tracking system.

Sample Image - maximum width is 600 pixels

Introduction

In the previous article we described the basic structure of a surveillance system and gave an overview of the code.

This article describes the algorithms that are used in the example of the tracking system given in the former article. The intention is to show that simple heuristics work in constraint environment.

But first let us understand the basics of image manipulation required for our demonstration.

Image manipulation

In order to simplify the model let us consider only gray level images.

Histogram

Each pixel in an 8 bit image can have 256 distinct values (0-255) when 0 is completely black and 255 is completely white. A histogram is the count of how many pixels are there in a certain value. For example:

Image 2

The above image displays the histogram of the following image:

Image 3

Just by looking at the histogram one can learn quite a lot about the image. For example if the peek is closer to 0 it means that the image is dark, on other hand if the peek is closer to 255 it means the image is bright.

Difference filter

Measuring the difference between two images means taking the value of a pixel from image A, taking the same pixel (same position) from image B and taking the absolute difference between them. If both pixels are equal in value then the difference will be 0 meaning totally black. For instance a difference filter between the right hand side and the left hand side images

Image 4 Image 5

yields:

Image 6

Threshold filter

Threshold is a technique that helps us to "delete" unwanted pixels from an image and only concentrate on the ones we want. For each pixel in an image if the pixel value is above a certain threshold convert it to 255 (white) otherwise convert it to 0 (black). For example, a threshold of 120

Image 7

transforms to:

Image 8

For more information on the subject please read [6].

Now let's develop a naive algorithm.

Naive approach

C#
function naiveApproach1(currentFrame) 
{
    difference <-- differenceFilter(currentFrame, oldFrame)
    for each (pixel != 0) in difference 
    {
        object <-- createObject(pixel, id)
        objects.Add(object)
        id++
    }

    return objects
}    

This approach treats each changed pixel in the current frame as indication to a moving object that we want to track. In this sense it is a simple motion detection algorithm. Here are possible problems with the above approach:

  1. We want to ignore the moving background, i.e. leafs falling, illumination changes, dirty pixels etc. Such a method will treat every small change as a moving object.
  2. Computationally heavy, too many moving objects can appear, it takes time to handle them.
  3. Important objects can never have a size of one pixel, it's quite obvious that moving objects are much larger.
  4. We do not "remember" tracked objects from frame to frame.

Thus a refinement is needed.

C#
function naiveApproach2(currentFrame, threshold) 
{
    difference <-- differenceFilter(currentFrame, oldFrame)
    blackAndWhite <-- for each pixel in difference 
            {
                        if (pixel > threshold)
                            pixel <-- 255
                        else
                            pixel <-- 0
                    }
    for each (pixel == 255) in blackAndWhite 
    {
        object <-- createObject(pixel, id)
        objects.Add(object)
        id++
    }

    return objects
}    

By using a threshold in order to 'delete' pixels if their change was not very big, we manage to ignore slow moving objects like leafs.

However we still register each new pixel as a new object, thus we may want to "connect" close pixels to one another to form one single object.

C#
function naiveApproach3(currentFrame, threshold) 
{
    difference <-- differenceFilter(currentFrame, oldFrame)
    blackAndWhite <-- for each pixel in difference 
        {
                if (pixel > threshold)
                    pixel <-- 255
                else
                    pixel <-- 0
            }
    blobs <-- connectPixels(blackAndWhite)
    for each blob 
    {
        object <-- createObject(pixel, id)
        objects.Add(object)
        id++
    }

    return objects
}    

Merging close pixels is a big improvement. The connectPixels algorithm is actually called "Connect components labelling algorithm" which you can read more about it in [2].

The algorithm naiveApproach3 is actually similar to the Motion segmentation part in the example algorithm code given in the previous article. However naiveApproach3 needs some more manipulation on the image in order to reduce noise.

The above method, called background subtraction, has many variations, but its core stays the same; take two images, analyze the difference between them by using a difference filter and the outcome is your moving object.

Let us try to implement Object classification.

Object classification

C#
function naiveApproach4(currentFrame, threshold) 
{
    difference <-- differenceFilter(currentFrame, oldFrame)
    blackAndWhite <-- for each pixel in difference 
        {
                if (pixel > threshold)
                    pixel <-- 255
                else
                    pixel <-- 0
            }
    blobs <-- connectPixels(blackAndWhite)
    for each blob 
    {
        object <-- createObject(pixel, id)
        if (object.totalSizeInPixels > 500)
            object.Class <-- Person
        else
            object.Class <-- Junk
        objects.Add(object)
        id++
    }

    return objects
}    

This approach is a naive one since it assumes that anything that is big enough (i.e. 500 pixels and more) is a person. The real question is whether this assumption is correct. That depends on the expectations from the environment, meaning what the environment actually contains, for example a lobby entrance scenario that will contain only people so this assumption might be correct on the other hand this assumption will not be correct on a highway.

There are more sophisticated ways to classify objects, for example by knowing that people are usually taller then wider, we can use this distinction.

Still one problem left, each new object is given a new id, but still there is no tracking.

Tracking

Implementing tracking will use a very simple but surprisingly successful heuristic[3] of object overlapping.

The assumption behind this idea is that in a video shot there are usually 10+ frames per second. That means that the same object will appear very close to the location it appeared on the previous frame, so if objects overlap in a non broken sequence of frames means they are the same object.

C#
function naiveApproach5(currentFrame, threshold) 
{
    difference <-- differenceFilter(currentFrame, oldFrame)
    blackAndWhite <-- for each pixel in difference 
        {
                if (pixel > threshold)
                    pixel <-- 255
                else
                    pixel <-- 0
                }
    blobs <-- connectPixels(blackAndWhite)
    for each blob 
    {
        newObject <-- createObject(pixel, id)
        if (newObject.totalSizeInPixels > 500)
            newObject.Class <-- Person
        else
            newObject.Class <-- Junk
        if (objects.ContainOverlapingObject(newObject)
            update old object position with the current position of newObject

        else
            objects.Add(newObject)
    }

    return objects
}

The tracking here has one major problem that this article does not intend to solve; If a person is walking behind a tree (the tree occludes him) that means the second time he will appear he will be considered as a totally new object with a new id. The occlusion problem has a major effect on any surveillance system, but in this example we assume a clean view.

The code above is not complete and thus several additions are required:

  1. There are two lists, one contains pending objects, i.e. objects that are still not being tracked and the other contains active objects that are being tracked.
  2. Pending object is a moving object that was just discovered.
  3. Before an object is considered to be an active object the algorithm waits to see if it keeps moving for at least 4 frames. it allows ignoring junk pixels and small moving objects like leafs.
  4. If an active object has stopped moving for 10 frames delete it from the list.

Iterate over all blobs, ignoring Unknown blobs.
    if pendingBlobs contains blob then
        increase the size of pending blob by 1.
        update blob location.
        if pending blob frame time >= 4 then move blob to activeBlobs and 
                                give them id.
    else if activeBlobs contains blob
        update blob location.
        initialize times not touched to 0.
    else
        add blob to pending blobs.

    Iterate over all pending blobs
        delete blob if it wasn't updated.
    Iterate over all active blobs
        delete blob if it wasn't updated for 10 frames.

return blobs in activeBlobs list.        

It is possible to divide the code into 3 main parts that correspond to the parts learnt in the previous article. Please pay attention that there is no implementation of any Environment modeling techniques, for information on the subject you are welcome to read [4].

Using the code

Image 9

The code itself is a translation of what we did here, the code is written in SimpleTrackingSystemExample project and it is compiled as a DLL.

The three parts motion segmentation, object classification and object tracking are all implemented interfaces, the really interesting part in the code is how they are connected together.

You probably all remember the general structure of a visual surveillance system. What we want is to encapsulate this structure, but let the programmer choose the specific details, here comes the wonderful builder pattern[5] that helps us.

Image 10

The BaseImageProcess accepts in its constructor three parameters:

C#
/// <summary>
/// A constructor
/// </summary>
/// <param name="ms">The Background Subtraction algorithm.</param>
/// <param name="classObj">The Classify Blobs algorithm.</param>
/// <param name="tracker">The Tracking algorithm.</param>
/// <exception cref="ArgumentNullException">One of the arguments is null.
                                </exception>
public BaseImageProcess(IMotionSegmentation ms, IClassifyObjects classObj, 
                        ITrackObjects tracker) {...

The programmer can change the parameters to create different types of image process but the process itself stays the same:

C#
// Find regions of moving objects.
ICollection<ExtendedBlob> blobs = motionSegmentation.execute(frame);
// Classify blobs.
ICollection<ExtendedBlob> classifiedBlobs = classifyObjects.execute(blobs);
// Track blobs.
ICollection<ExtendedBlob> finalLocation = 
                trackObjects.execute(classifiedBlobs);
return finalLocation;

Of course you do not have to use BaseImageProcess in your tracking algorithms, all you have to do is to implement IImageProcess interface.

What to do next

Test the example algorithms that came with the project on various scenes, check what happens when you are using it on people walking from right to left, check what happens when you try to track leaves of a tree falling.

Test the limits, check where it fails, check where it succeeds.

Conclusion

  1. We learnt different techniques of image manipulation.
  2. We tried to develop simple algorithms which can be used in a visual surveillance system.

Bibliography

  1. DIGITAL PHOTOGRAPHY TUTORIAL.
  2. Fundamentals of computer vision - See chapter 3 on image segmentation
  3. Tracking groups of people
  4. Moving target classification and tracking from real-time video
  5. Builder pattern
  6. Image Processing Operator Worksheets

Special Thanks

To Anat Kravitz for her help.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here