For the latest version of the code, which may contain the latest enhancements and corrections, please download the latest version of the Accord.NET Framework.
Contents
- Introduction
- Feature extraction
- Feature matching
- Homography estimation
- Blending
- Results
- Source code
- Interest point detection
- Correlation matching
- Robust homography estimation
- Gradient blending
- Doing it all at once
- Sample application
- Remarks
- Known issues
- Further works
- Conclusion
- Acknowledgements
- References
The Accord.NET Framework originated as an extension framework for AForge.NET, a popular framework for computer vision and artificial intelligence. However, Accord.NET now implementsa large number of tools and features not available in AForge.NET, such as Kernel Support Vector Machines, Discriminative and Projective Analysis, Hidden Markov Models, new Neural networks learning algorithms, object detection, imaging filters and other useful methods and tools. Accord.NET allows, among other things, to develop programs for automatic image stitching and possibly for automatic panorama creation.
A panorama is a picture that is made by combining a series of photos into one large picture. By combining a series of photos, it is possible to provide a complete view of an area or location that cannot fit in a single shot.
In the example below, we will demonstrate how to use features already available in Accord.NET to stitch two images together and create a simple and small panorama. In the last section, we will discuss ways to generalize the method for multiple pairs of images, creating full blown image mosaics.
For this demonstration, we will be using these two pictures taken from the UFSCar lake. The pictures have been taken a few degrees apart from each other. We can see that the gymnasium appears only in the second picture, whereas the leftmost portion of the lake appears only in the first picture.
In order to create our panorama, the general idea will consist in identifying common points between the two images and then projecting one of the images on top of the other in an effort to match those points. In order to identify those points, which from now on we will be calling interest points, we will be using a simple interest point detector known as the Harris Corners Detector.
While the full description of the Harris detector is beyond the scope of this article (but is available here), I thought it was interesting to include a little history about corner detectors and the Harris operator.
One of the first operators for interest point detection was developed by Hans P. Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. It was also Moravec who defined the concept of "points of interest" in an image and concluded these interest points could be used to find matching regions in different images. The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. It is interesting to note, however, that Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames.
Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly. They needed it as a processing step to build interpretations of a robot's environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames.
Pictures of the lake with detected interest points marked in white
While there are better interest point detectors out there, such as SIFT and SURF, those are unfortunately encumbered by patents. In order to allow full commercial usage of this library, I decided to avoid using them and use the Harris detector instead.
After our interest points have been detected, we need to correlate them somehow. For this, we will be using a maximum correlation rule to determine matches between our two images. The cross-correlation works by analyzing a window of pixels around every point in the first image and correlating them with a window of pixels around every other point in the second image. Points which have maximum bidirectional correlation will be taken as corresponding pairs.
Lake images showing the correspondence relationships among points
We can see from the picture, however, that many points have been wrongly correlated. This is the case of the diagonal lines that do not follow the same direction as the majority of other lines. Nevertheless, we will need not to worry about those for reasons which shall be clear in the next section.
Now that we have two sets of correlated points, all we have to do is define a model which can translate points from one set to the other. What we are looking for is some kind of image transformation which can be used to project one of the two images on top of the other while matching most of the correlated feature points - we need a homography matrix matching the two images.
A homography is a projective transformation, a kind of transformation used in projective geometry. It describes what happens to the perceived positions of observed objects when the point of view of the observer changes. In more formal terms, a homography is an invertible transformation from the real projective plane to the projective plane that maps straight lines to straight lines.
By using homogeneous coordinates, one can represent an homography matrix as a 3x3 matrix with 8 degrees of freedom.
In the System.Drawing
namespace, there is a Matrix
class which encapsulates a 3-by-3 affine matrix that represents a geometric transform. In Accord.Imaging, the MatrixH
class encapsulates a 3x3 homogenous matrix that represents a projective transform. The differences between the two matrices relies only on the degrees of freedom for each one.
Despite being a 3x3 matrix, the geometric transform matrix of System.Drawing
has only 6 degrees of freedom, whereas the projective transform of Accord.NET has 8 degrees of freedom. In the latter, the last value can be interpreted as a scale parameter and can be fixed at 1, as shown in the picture above. Homogeneous coordinates are used because matrix multiplication alone cannot directly perform the division required for perspective projection.
Using homogeneous coordinates, instead of representing the position of every pixel in the image as a pair <x,y>
, we will be representing it as a tuple <x,y,w>
, where w
is also a scale parameter. For simplification, we will let w
be fixed at 1.
Homogeneous coordinates are very useful because they will allow us to perform an image projective transformation by using only standard matrix multiplication, as shown by the equation and schematic diagrams below.
Once all the projected points have been computed, we can recover our original coordinate system by dividing each point by its homogeneous scale parameter and then dropping the scale factor, which after division will be set at 1.
By estimating the correct values for the homography matrix, we could obtain a transformation like the following:
Which could possibly result in a final projection like this:
Now that we have defined what a homography is and what it is useful for, we can begin to discuss how we can create this homography matrix from our set of correlated points. To estimate a robust model from the data, we will be using a method known as RANSAC.
The name RANSAC is actually an abbreviation for "RANdom SAmple Consensus". It is an iterative method for robust parameter estimation to fit mathematical models from sets of observed data points which may contain outliers. The algorithm is non-deterministic in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are performed. The RANSAC algorithm has found many applications in computer vision, including the simultaneous solving of the correspondence problem and the estimation of the fundamental matrix related to a pair of stereo cameras.
The basic assumption of the method is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers could be considered points which come from noise, erroneous measurements, or simply incorrect data.
For the problem of homography estimation, RANSAC works by trying to fit several models using some of the point pairs and then checking if the models were able to relate most of the points. The best model, i.e., the homography which produces the highest number of correct matches, is then chosen as the answer for the problem.
After the execution of RANSAC, we can see only the correct matches were left in our images. This happens because RANSAC did find a homography matrix relating most of the points, and discarded the incorrect matches as outliers.
The full explanation of RANSAC is way beyond the scope of this article, but more information about a RANSAC implementation in C# can be found here.
After our homography matrix has been computed, all that is left for us is to blend the two images together. To do so, we will be using a linear gradient alpha blending from the center of one image to the other. The gradient blending works by simulating a gradual change in one image's alpha channel over the line which connects the centers of the two images.
The image shows the linear gradient used by the blending filter. The leftmost gray part of the image represents the left picture of the lake. The rightmost part represents the right picture of the like. The inner part shows the linear gradient which blends one image to the other. The first, grayest color of the gradient represents the first image, while the darkest color represents the second image. The transition indicates how much of each image will be present in the final blending.
After the blending filter has been applied, we will finally have our result as shown in the image below:
Final image resulting from the blending of the two consecutive lake images
Using Accord.NET, the code for image stitching becomes extremely simple. Remember, we will need to perform four steps: Interest point detection, Correlation matching, Robust homography estimation, and Gradient blending. So before going into those steps, let's define a few class variables (this code was taken from the sample application code):
private Bitmap img1 = Panorama.Properties.Resources.UFSCar_Lake1;
private Bitmap img2 = Panorama.Properties.Resources.UFSCar_Lake2;
private IntPoint[] harrisPoints1;
private IntPoint[] harrisPoints2;
private IntPoint[] correlationPoints1;
private IntPoint[] correlationPoints2;
private MatrixH homography;
The source code accompanying this article contains the sample application plus only a small subset of the Accord.NET Framework to avoid cluttering namespaces. If you wish to use Accord.NET, please be sure to download the latest version from the project page at GitHub. By the way, the sample application assumes you will just push buttons in the right order. If you call a step before the previous steps have been computed, you will most probably get an exception. This is just to reflect the code that is needed for the stitching. It is also exactly the same code shown in the next sections.
To perform interest point detection, we will be using the <a href="http://accord-framework.net/docs/html/T_Accord_Imaging_HarrisCornersDetector.htm">Accord.Imaging.HarrisCornersDetector</a>
class. To make the process faster, we could use a higher suppression threshold for suppressing some points. For this example, we will be using 1000.
private void btnHarris_Click(object sender, EventArgs e)
{
HarrisCornersDetector harris = new HarrisCornersDetector(0.04f, 1000f);
harrisPoints1 = harris.ProcessImage(img1).ToArray();
harrisPoints2 = harris.ProcessImage(img2).ToArray();
Bitmap img1mark = new PointsMarker(harrisPoints1).Apply(img1);
Bitmap img2mark = new PointsMarker(harrisPoints2).Apply(img2);
Concatenate concatenate = new Concatenate(img1mark);
pictureBox.Image = concatenate.Apply(img2mark);
}
Using Accord.NET, correlation can be achieved by using the <a href="http://accord-framework.net/docs/html/T_Accord_Imaging_CorrelationMatching.htm">Accord.Imaging.CorrelationMatching</a>
class. In this example, we will be considering a window of 9 pixels around each point.
private void btnCorrelation_Click(object sender, EventArgs e)
{
CorrelationMatching matcher = new CorrelationMatching(9);
IntPoint[][] matches = matcher.Match(img1, img2, harrisPoints1, harrisPoints2);
correlationPoints1 = matches[0];
correlationPoints2 = matches[1];
Concatenate concat = new Concatenate(img1);
Bitmap img3 = concat.Apply(img2);
PairsMarker pairs = new PairsMarker(
correlationPoints1,
correlationPoints2.Apply(p => new IntPoint(p.X + img1.Width, p.Y)));
pictureBox.Image = pairs.Apply(img3);
}
Here, we will be using the <a href="http://accord-framework.net/docs/html/T_Accord_Imaging_RansacHomographyEstimator.htm">Accord.Imaging.RansacHomographyEstimator</a>
class for estimating homography matrices using RANSAC. RansacHomographyEstimator
is a convenience class encapsulating all the logic used to setup an <a href="http://accord-framework.net/docs/html/T_Accord_MachineLearning_RANSAC_1.htm">Accord.MachineLearning.RANSAC</a>
object with the parameters for fitting a homography matrix.
private void btnRansac_Click(object sender, EventArgs e)
{
RansacHomographyEstimator ransac = new RansacHomographyEstimator(0.001, 0.99);
homography = ransac.Estimate(correlationPoints1, correlationPoints2);
IntPoint[] inliers1 = correlationPoints1.Submatrix(ransac.Inliers);
IntPoint[] inliers2 = correlationPoints2.Submatrix(ransac.Inliers);
Concatenate concat = new Concatenate(img1);
Bitmap img3 = concat.Apply(img2);
PairsMarker pairs = new PairsMarker(
inliers1,
inliers2.Apply(p => new IntPoint(p.X + img1.Width, p.Y)));
pictureBox.Image = pairs.Apply(img3);
}
The final step is to create a blending of the two images. For this, we will be using the <a href="http://accord-framework.net/docs/html/T_Accord_Imaging_Filters_Blend.htm">Accord.Imaging.Blend</a>
filter, which is implemented as an AForge.NET image filter.
private void btnBlend_Click(object sender, EventArgs e)
{
Blend blend = new Blend(homography, img1);
pictureBox.Image = blend.Apply(img2);
}
To perform everything in a single step, we can use the following code in a single button:
private void btnDoItAll_Click(object sender, EventArgs e)
{
btnHarris_Click(sender, e);
btnCorrelation_Click(sender, e);
btnRansac_Click(sender, e);
btnBlend_Click(sender, e);
}
And this finishes the entire code needed for automatic image stitching!
The sample application comes with other two sets of photos, one taken from inside of the UFSCar Computing Department building, and another taken from a random wallpaper found in the web. Below are the screenshots from the application stitching two pictures of the computing department.
Images from inside the Computing Department
However, in order to use the additional images, it will be necessary to add them to the project resource file and then select them in the code. Or, you could add your own image loading dialogs. I avoided using file open dialogs in order to show only a compact and direct view of the stitching methods.
RANSAC is a probabilistic algorithm. This means it may generate different outputs on every execution, and may produce a correct answer only within some probability. So the quality of the final result may vary wildly if RANSAC parameters are not chosen wisely. Also, as you may have noted, there are many parameters to tweak in each step of the stitching method. Choosing the correct values for the parameters may have a direct influence on the quality and the time spent for producing the result.
The implementation, as it is, does not perform very well on images with very different lighting conditions. To avoid this situation, we should normalize the two images before starting the method.
Another known issue is the quality of the Blending filter. While it may perform well on images displaced laterally, it may perform poor on images displaced on the diagonal. As it can be seen in the stitching using the Computing Department pictures, a distinct horizontal line will be visible between one image and the other.
This code is well suited for the automatic panorama creation for a pair of images. However, as it stands, it still can not be used to create full panoramas using an arbitrary number of images. In order to extend the code for the case of multiple images, we could form panoramas by recursively taking pair of images from the total set of images and then blending them forming a single image.
The code could also be extended to create image mosaics in which we don't know the correct order of the images beforehand, by considering a minimum threshold of correlation between two images before attempting a match. This way, only images with strong correspondences are stitched at a time, and a full mosaic could again be constructed recursively.
In this article, we have seen how automatic panorama creation is possible using well-defined and simple processing steps. Those steps are already available in Accord.NET, an Open-Source (LGPL) framework built on top of AForge.NET (also LGPL). For a complete set of the features currently available in Accord.NET, please see the project's intro page.
To Andrew Kirillov, author of the excellent AForge.NET Framework, for the effort and time spent building his framework. The C# community was really in need of such a great project.
- P. D. Kovesi. MATLAB and Octave Functions for Computer Vision and Image Processing. School of Computer Science & Software Engineering, The University of Western Australia.
- D. Parks and J. P. Gravel. Corner Detectors: The Harris/Plessey Operator. Web. 11 May, 2010.
- J. Hutton and B. Dowling. Computer Vision Demonstration Website. Electronics and Computer Science, University of Southampton.
- H. P. Moravec. Towards Automatic Visual Obstacle Avoidance. Proc. 5th International Joint Conference on Artificial Intelligence, pp. 584, 1977.
- H. P. Moravec. Visual Mapping by a Robot Rover. International Joint Conference on Artificial Intelligence, pp. 598-600, 1979.
- C. Harris and M. Stephens. A Combined Corner and Edge Detector. Proc. Alvey Vision Conf., Univ. Manchester, pp. 147-151, 1988.
- A. Kirillov. The AForge.NET Framework. The AForge.NET Computer Vision, Artificial Intelligence and Robotics Website, 2010.
- Wikipedia contributors. "RANSAC" Wikipedia, The Free Encyclopedia.
- Wikipedia contributors. "Corner detection" Wikipedia, The Free Encyclopedia. 5 May, 2010. Web. 11 May, 2010.