Object Tracking Introduction
After reading this article you will know how to easily perform object tracking on Android devices and a desktop computer. In computer vision, object tracking is the problem of tracking visual objects across a video sequence.
BoofCV contains several general purpose object tracking algorithms, including state-of-the-art algorihtms which only recently have been proposed. An easy to use high level interface is provided, which allows a user to select a region in the image and then track it.
YouTube Video
One thing to remember is that what is easy for a human can be quite difficult for a computer. People can easily track objects when they are partially obscured or dispear for a moment. The same is not true for even the best algortihms which exist today. They typically work well under certain limited situations and fail in the rest. A specific technique needs to be selected for each situation.
When long term object tracking is mentioned, what that means is that the tracker can re-detect an object once track has been lost. If the object being tracked leaves the image for a few moments it can redetect it later on. This is actually a difficult problem to do in real time while avoiding excessive false positives, e.g. incorrectly detecting the object. Plus the appearance of objects constantly changes and the detector needs to be updated via machine learning.
The tracking algorithms below are referred to as general purpose because they algorithms make few assumptions about the environment. For example, they don't assume the camera is stationary. The stationary assumption does greatly simplifies the tracking problem, but limits where it can be used.
List of Algorithms in BoofCV
- Circulant
- Track-Learning-Detect (TLD)
- Sparse-Flow
- Only tracker in BoofCV which can estimate rotations
- Is brittle and works best on planar objects
- Javadoc description
- Mean-Shift Histogram
- Matches the histogram of a local neighborhood
- Can be configured to crudely estimate scale
- Comaniciu, et. al. ,"Kernel-Based Object Tracking" 2003
- Mean-Shift Likelihood
- Extremely fast but only works well when a single color dominates
Object Tracking Interface
The high level object tracking interface in BoofCV is shown below. It simplifies the code and removes most book keeping. A track is started by calling initialize(), which takes an image and the location of the object. Make sure you check the return value since it can fail! Then as new images arrive call process(). The new location of the track is written into process()'s second parameter, 'location'. Don't forget to check process()'s return value!
public interface TrackerObjectQuad<T extends ImageBase> {
public boolean initialize( T image , Quadrilateral_F64 location );
public boolean process( T image , Quadrilateral_F64 location );
}
While a Quadrileral is used, most tracking algorithms internally use more simplistic data structures. So the quadrilateral you pass in might get converted into a rectangle or other shapes.
The best way to create a tracker is via the FactoryTrackerObjectQuad factory. Low level implementations are available via FactoryTrackerObjectAlgs and do offer more flexibility, but they require an indepth knowledge of the algorithms and the code. For example, if you work with the low level mean-shift tracker you can specify a color for it to search for and do not need to mark an input image.
tracker = FactoryTrackerObjectQuad.circulant(null, ImageUInt8.class);
In the above example, a Circulant tracker was created with default parameters. The first parameter is typically a Config class and the second specifies the input image type. If null is passed in then reasonable defaults are used. If you feel adventurous, stability and speed can be improved with custom configuration sometimes, but you need to take the time to understand what you're doing.
Desktop
Requirements:
- Java SDK 1.6 or later
- Gradle
- A webcam attached to the computer
- Linux, Windows, MacOS
To run the demonstration, download the code from the top of this article and execute the following gradle script:
cd TutorialObjectTracking/desktop/
gradle webcamRun
A window should pop up and you select which objects to track by click and draging to create a rectangle. To change the tracker you will need to modify the source code and run Gradle again.
public void process() {
Webcam webcam = UtilWebcamCapture.openDefault(desiredWidth, desiredHeight);
...
T input = tracker.getImageType().createImage(actualSize.width,actualSize.height);
...
while( true ) {
BufferedImage buffered = webcam.getImage();
ConvertBufferedImage.convertFrom(webcam.getImage(), input, true);
int mode = this.mode;
boolean success = false;
if( mode == 2 ) {
...
success = tracker.initialize(input,target);
} else if( mode == 3 ) {
success = tracker.process(input,target);
}
...
}
}
All the tracking code is contained in the last two lines. The tracking is started by calling initialize() and updated by calling process(). That's it. Rest of the code is all GUI boiler plate and accessing the webcam.
Android
Requirements:
- Android Studio
- Android SDK 14 or later (might work with earlier version too, but not tested)
Android source code is also included with this article, download from the top. Android is a bit more complicated than the desktop due to all the boilerplate. How you use BoofCV is the same, initialize and track. BoofCV provides several tools for working on Android to make converting between the images and working with videos easier. See comments in the source code.
The BoofCV demonstration application contains tracking examples.
Demonstration on Play Store
Posting and explaining all the Android specific code goes beyound the score of this article, but the code
in which the trackers are configured is shown below. Several of the trackers have been configured specificly for Android devices, which are much less powerful than desktop computers.
private void startObjectTracking(int pos) {
TrackerObjectQuad tracker = null;
ImageType imageType = null;
switch (pos) {
case 0:
imageType = ImageType.single(ImageUInt8.class);
tracker = FactoryTrackerObjectQuad.circulant(null, ImageUInt8.class);
break;
case 1:
imageType = ImageType.ms(3, ImageUInt8.class);
tracker = FactoryTrackerObjectQuad.meanShiftComaniciu2003(
new ConfigComaniciu2003(false),imageType);
break;
case 2:
imageType = ImageType.ms(3, ImageUInt8.class);
tracker = FactoryTrackerObjectQuad.meanShiftComaniciu2003(
new ConfigComaniciu2003(true),imageType);
break;
case 3:
imageType = ImageType.ms(3, ImageUInt8.class);
tracker = FactoryTrackerObjectQuad.meanShiftLikelihood(
30,5,256, MeanShiftLikelihoodType.HISTOGRAM,imageType);
break;
case 4:{
imageType = ImageType.single(ImageUInt8.class);
SfotConfig config = new SfotConfig();
config.numberOfSamples = 10;
config.robustMaxError = 30;
tracker = FactoryTrackerObjectQuad.sparseFlow(config,ImageUInt8.class,null);
}break;
case 5:
imageType = ImageType.single(ImageUInt8.class);
tracker = FactoryTrackerObjectQuad.tld(
new ConfigTld(false),ImageUInt8.class);
break;
default:
throw new RuntimeException("Unknown tracker: "+pos);
}
setProcessing(new TrackingProcessing(tracker,imageType) );
}
For example, the ConfigTld(false) creates a configuration for TLD that sacrifices scale invariance for speed.
The End
That's it let me know if you enjoyed this article!