Introduction
This is a started out as a set of simple extension methods for the System.Drawing.Image
class which would allow you to:
- Find out how different two images are as a percentage value based on a threshold of your choosing
- Get a difference-image which shows where the two are different
- Get the difference as raw data to use for your own purposes
With the feedback received here, the solution has been expanded to
- A console version which can take the paths of two images as parameters, and return the difference as an errorlevel
- A COM callable DLL (still in need of testing - anyone?)
- The ability to find duplicates in a folder, or among a list of imagepaths
As a bonus, you get extension methods for resizing or grayscaling an image.
With the feedback I recieved here, the
Background
You don't need to read this chapter to use the extension methods, so if you don't care about how I created the software - just skip to the Using the code bit.
I was happily coding away one evening, making a little tool to download images from posts in an RSS feed, when I first got the idea: "If there's a link from the blog post to another post cited as the source, why not have the code go there and check for a higher resolution version of the image?" Then I thought "but the original image might not have the same name as the one I first met in the post"
So I had to find a way of comparing the visual representation of the two images. I am not particularly good at math, so after Googling around and finding assorted algorithms using wavelets, keypoint matching, etc. (which seemed out of my mental reach ), I found out that some people were having good results using histograms. So I first went down that path.
Histograms
A histogram is a way of representing what kinds of colors are in a picture. You can create a histogram describing the light or red, green, or blue values in an image. A basic way of creating a histogram is to look at each pixel in a bitmap and for each of them find out what the value of the property you are looking at (RGB) is. For each possible value (typically 0-255), you have a variable which you increment. This way, when you are done with all the pixels, you can iterate over the variables and see how many pixels had low values, medium values, and high values of light, R, G, or B. I fiddled around a couple of hours with this, but found out that in the end, this wasn't good enough for detecting differences in pictures, as two totally different pictures depicting almost the same (e.g., two corn fields) with the same composition of colors/light can be hard to differentiate using just a histogram. I tried comparing the two using average in colors, and variance, but to no avail.
The good thing is that as a byproduct of my work, now there are histogram extension methods for the Bitmap
class: GetRgbHistogram
and GetRgbHistogramBitmap
.
Bitmap bmpHist = img1.GetRgbHistogramBitmap();
bmpHist.Save("C:\\bmphist.png");
Histogram hist = img1.GetRgbHistogram();
Console.WriteLine(hist.ToString());
Feel free to use them
Note: Based on Svansickle's comments on this article on histograms I've implemented the Bhattacharyya histogram algorithm, which is a way of comparing two images based on their normalized histograms.
You can play around with this to see whether this way of comparing two images is better for your purposes. I've added the functionality because it seemed interesting, and I'd love to hear if you've found uses for it :)
Simplifying
Histograms were a dead end to me. But after Googling around some more, I read some forum posts suggesting that if images were reduced to a much smaller size, and even gray scaled, then the differences would be both faster and simpler to find. It was worth a try.
Using the .NET Framework, I could easily resize an image. Then I found some code that would grayscale an image. Now all that was left was to iterate through the pixels on both images and compare the two and then find out how many were different.
Here are the two images I used, with an XBOX controller, a post-it with and without text and two different colored pens.
I thought I'd start out by using a gray scaled, 16x16 pixel version of each image, and see whether I needed higher resolution for practical use.
Upscaled versions of the two 16x16 pixel images
For each pixel I'd then get the difference of the brightness value compared to the other image's pixel in the same spot and save it in a double array of bytes (since R, G and B values in Bitmap
can be between 0 and 255). I would then count all the values in the double array which weren't zero, divide this value by the amount of pixels in an image (256), and voilà - I would have a difference value in percentages - right? ....not quite
It works well enough to visualize differences between two images as you can see here:
...and the low resolution seemed to work okay - yay!
But...!
There was a slight problem though. My algorithm was also finding differences where none were visible to the naked eye. Here and there a re-encoding of a JPEG in the same resolution as the original or identical images in different resolutions suddenly had a lot of differences with a value of 1 or 2 showing up. This could easily give me fifty pixels with a difference - and fifty pixels out of 255 is about 20%, which makes the algorithm too blunt, since a human could detect no difference visually. So I introduced a threshold (a value that the difference had to exceed to be counted).
Using a threshold
Here you can see the differences between a 200 pixels wide version of an image and a 100 pixels wide version of the same image:
You can see what threshold values resulted in the above (the red text). By default, pixel difference values below 4 are now treated as no difference, and it is possible to adjust this by giving the extension method an optional parameter:
int threshold = 5;
float percentageDifference = img1.PercentageDifference(img2, threshold);
This also makes it possible for you to adjust the sensitivity of the code according to your needs. A default threshold of three works for me, but by all means - play around with it, and adjust this to your heart's desire or depending on the task you need it for.
That's it folks!
This is where my story ends. I now have the ability to detect whether two images are similar, how much they differ, and where, which was all I wanted - yay! XD.
I hope you've had fun reading my little coding-story, and I hope you can use the code somehow.
Kind regards - Jakob "XnaFan" Krarup.
Using the code
Get the DLL
All you need to do is download the DLL or the complete solution (see top of article), add a reference to XnaFan.ImageComparison.dll, and a using XnaFan.ImageComparison
statement to your code file - and you should be set to go.
Use the public methods of ImageTool
float GetPercentageDifference(string image1Path, string image2Path, byte threshold = 3)
float GetBhattacharyyaDifference(string image1Path, string image2Path)
List<List<string>> GetDuplicateImages(string folderPath, bool checkSubfolders)
List<List<string>> GetDuplicateImages(IEnumerable<string> pathsOfPossibleDuplicateImages)
Use the extensionmethods for Images and Bitmaps
float PercentageDifference(this Image img1, Image img2, byte threshold = 3)
Bitmap GetDifferenceImage(this Image img1, Image img2,
bool adjustColorSchemeToMaxDifferenceFound = false, bool absoluteText = false)
byte[,] GetDifferences(this Image img1, Image img2)
byte[,] GetGrayScaleValues(this Image img)
Image GetGrayScaleVersion(this Image original)
Image Resize(this Image originalImage, int newWidth, int newHeight)
Bitmap GetRgbHistogramBitmap(this Bitmap bmp)
Histogram GetRgbHistogram(this Bitmap bmp)
Visualizing the differences
I have included the possibility of color coding the difference bitmap either by using a palette of black to pink corresponding to values 0 to 255, or by having the palette map to whatever is the current max. This will enable you to highlight small differences or keep them dark as you wish. See the difference here:
This is the difference between using the parameter adjustColorSchemeToMaxDifferenceFound
as true or false:
Bitmap diffNoAdjust = bmp1.GetDifferenceBitmap(bmp2);
Bitmap diffAdjusted = bmp1.GetDifferenceBitmap(bmp2, true);
Any trouble using the code, or ideas for improvements - let me know.
Sample Console and WPF application included
To get you started, I've included a sample Console application and a WPF app - to show how you can use the code. WPF also has the added benefit of giving you code to transform between System.Drawing.Image
("regular" .NET) and System.Windows.Media.Imaging.BitmapSource
(WPF). I am just starting on WPF, so it was a learning-by-failing experience. Don't go looking for any best-practices there ;-D.
Console version
A reader asked for a command prompt version of the functionality, so there is also a commandline version now, which returns the difference between the images as an error level so you can use it from a batch file or other programming language.
Usage
ImageComparisonConsole.exe [image1 path] [image2 path]
Here is a sample batch file using it:
@echo off
<pre>REM saving paths to images
REM you can also use absolute paths. i.e "C:\something.png"
set image1="firefox1.png"
set image2="firefox2.png"
REM print what is about to happen
echo 'ConsoleImageComparison.exe %image1% %image2%'
REM execute the program
call ConsoleImageComparison.exe %image1% %image2%
REM tell what the detected difference is
echo The difference is %ERRORLEVEL%%%
You can just drag two image files onto the console app, and the errorlevel will be set, or drag them onto a batchfile which calls the app and displays the errorlevel, etc.
This is only a tool - you decide what to use it for :)
History
- April 2012 - Version 1.0.
- December 2012 - Version 1.1 - small clarifications
- January 2013 - Version 1.2 - added console version which sets error level
- September 2014 - Version 1.3 - disposed of images - thanks to commentor below :)
- November 2014 - Version 1.4 - COM compatibility for comparison method
- November 2014 - Version 1.5 - Find duplicates functionality added
- November 2013 - Version 1.6 - Bhattacharyya (normalized) histogram implemented, and disposing of Image objects (thanks to commentors! ..you know who you are ;-))
- June 2021 - published a .NET Core version to Github
- Just the core functionality to compare two images and find duplicates of an image
- Parallellized version for faster comparison (using Parallel.ForEach)
- Cleaned functionality, naming and comments