This article is concerned with some practical issues that encountered using a perceptual hash approach to image comparison.
Introduction
First and foremost: I really mean plain old C in fact K&R C. Why? Discussed later.
Secondly: I do not have a deep enough understanding of image comparison to attempt to explain how image comparison works.
Thirdly: I have got image comparison (or something that looks like it) working. (Oh dear, there is a terrible pun there):
Fourthly: I shall highlight some of the findings that I made in my quest to identify near-duplicate images. The findings include:
OleLoadPicturePath
and IPicture
is easy in plain old C but not too performant - using
gdi+
, intended for C++ and COM, in plain old C is do-able, but a bit more work, and seems to be more performant than OleLoadPicturePath
and IPicture
- Discrete Cosine Transformation, is fundamentally difficult, but very very easy to utilise in the context of this article
- The default Discrete Cosine Transformation function can be massively accelerated for this specify use case, by a factor of >150
- Use of halftone correction when compressing bitmaps needs to be considered
- With the approach described in this article, a rotated image is not a near duplicate of the original image ;-(
- In my specific case concerning 30,000+ images, a database helped immensely
- and a Burkhard Keller tree helped me even more
Along the way, you might notice not quite a rant, but certainly a loud mumble, about the benefits to a solo programmer of the combination of plain old C, RPN post-fix Forth-like languages, and another golden oldie: Basic.
Although I suspect that I have > 10,000 hours of programming experience, I am not a code guru, so everything in this article must be qualified as being AFAIK (which might not be very far).
Links and Disclaimer
My understanding of perceptual hashes comes from on-line searches. Here are some of the sites that I used to gain my limited understanding (limited by my abilities, not by the information sources)
I have no association with the following sites, nor with their authors, nor with any intermediatory person or legal entity. I provide this list of some of the sites that I visited simply in case you want a kick-start in relevant sites to visit. At the time of writing this article, the links were valid, and I am not aware of any malicious or detrimental effect arising from browsing the specified links.
Words of Thanks, and Acknowledgments
Using plain old C is okay but at some point, in a Windows environment, you have to face using the Component Object Model (COM
). I got my kick-start for using COM
in plain old C from articles in CodeProject. Try COM-in-plain-C by Jeff Glatt 2006 (my gratitude!).
I got my kick-start for using gdi+
from a stackoverflow discussion, specifically: gdi+ related article
I would like to take this opportunity to sing the praises of the people that maintain SQLite. Quite simply, a wonderful piece of work. Without SQLite, I think that my career since 2014 would have been less enjoyable. I am immensely jealous, and exceedingly grateful.
I refer in this article to a metadata tagging and browsing tool, I believe there are numerous such tools, but my experience (very limited) is with exiftool. It was whilst creating my own metadata management tool that I stumbled upon exiftool
. And it was that metadata management tool that brought me to perceptual hashing.
Why Plain Old C
I have no problems with the enhancements to K&R C, what I do wish to imply is: I do not use C++, neither for object orientated programming, nor (and this is the important bit) for imperative programming.
Why? Well C permits the copying of pointer values from one pointer type to another without any explicit casting expression. Due to my use (explained later) of a Forth-like virtual machine, I (possibly foolishly) have made extensive of such pointer copying. Such code does not compile in a C++ environment. Mea culpa. But as discussed later around the issue of Direct2D
, I might have found a gentle migration path to C++.
I have been programming for over 52 years, 42 of those years professionally. When programming for myself, I take immense pleasure in the journey and less pleasure in the destination. In programming terms, this means the understanding and achieving of a functionality is more important (for me) than using the end program. If I wanted a program with a GUI, I could use one of the GUI frameworks, or I could write my own at the (in Windows) message handling level. Guess which approach I use!
I started working in programming back in 1979 using hardware-near languages: assembler for various CPUs: Intel i80/i85/i86..i486, Motorola 68000, Sun Sparc, and "higher-level" hardware-near languages such as C, PL/M and CHILL.
Along the way, I looked in on (but did not stay with): Fortran, Algol, Pascal, Smalltalk, C++ (for object orientation) and C#. In 1972, Basic caught my eye as a fun language, and in 1988, Forth shocked me because of its simplicity and potential. To be honest, I have never given Lisp a chance, my fault not Lisp's. For Web programming, I will happily use PHP. Python, Haskel, Rust, etc. etc. etc. came along too late for me.
I like writing blocks of functionality and having an interactive environment in which I can throw the blocks together to make a solution. It is a lot like the original Lego concept. Create a limited set of basic building blocks and only create a custom block if it cannot be simulated by a collection of basic blocks.
Since 1988, under the influence of Forth, I have been following this principle. I code my building blocks in C, and drop the blocks into a single executable, the executable includes a reverse Polish notation post-fix
interpreter along the lines of a traditional Forth system. Building blocks can then be exposed to the command-line-interpreter/user via a Forth-like wrapper.
I call the whole concept/thing/idea/whatever a rapid interactive programming environment technology, or RipeTech for short, and the set of building blocks and supporting Forth-like code when compiled is my rtForth executable.
Back to Basics
Around 2014, I had a contract on a massive project (hundreds of millions of €). The project was not part of the company's daily business. Their IT department was fully occupied on daily business for >20 million active customers. But the project demanded a lot of ad hoc data analysis. I have learnt over the years that ad hoc data analysis requests from a project that is not part of daily business is a no no for most IT departments.
Excel files (some of 900,000 lines & 800 columns, per week) were whizzing around the project, and a few luckless individuals were writing VBA macros to try to get meaning out of the data. Er, it was painful.
We could not install Access due to company policy/cost (and members' skill levels). So to improve my own work conditions, I used the SQLite source code (single file, plain C) that I had in rtForth with some wrappers around the needed SQLite functionality. SQL meant that I could massively simplify and speed up my data analysis workload. And SQLite meant that I could do it serverless.
But the unit of exchange within the project was Excel files. So I added in a Component Object Model (COM) capability at the C level, wrote some rtForth wrappers and exchanged data with Excel using Object Linking and Embedding / Automation /(COM) .
That helped me, but I believe 99.9% of people have issues with reverse Polish notation post-fix
programming.
So I wrote a script, in rtForth of course with some C building blocks for performance, that overlays rtForth with an in-fix interpreter with a syntax based on Basic-like interpreters such as Microsoft Visual Basic. I embedded these scripts in the rtForth executable and ended up with a stand-alone Basic-like interpreter with built-in SQL & (COM) capability. This was a language that people with Microsoft VBA skills could easily adapt to. And it was in an executable that did not need installation so there were no issues running it on a company PC/laptop (in retrospect you might consider this, er, naughty!).
I, alone, was doing up to 1500 data analysis deliveries to the project per year, based on a weekly updated set of project data ( the 900,000 lines of 800 columns), and using SQLite / COM and scripts. I was on this project for almost 7 highly enjoyable years, living off my hobby-horse. I really was in hog heaven.
I call the Basic-like stand-alone executable pwBasic
, pw stands for PlodWare the name I use for my programming efforts in Basic. I wrote my first Basic program back in 1971/72 and although I like Basic, it took me 40+ years of plodding along before I finally got to write my own!
This layering approach of functionality/language is IMHO (I am not particularly humble) highly flexible. Functionality can be added at the C level where appropriate and then wrappers written for it in rtForth
or pwBasic
.
Example: I wanted to provide inline help for an application, and wanted the same information to be online. So I wrote the help text (once) in HTML and dropped a HTML browser control as a building block into the rtForth environment. The HTML block required quite a few rtForth wrappers because I wanted to provide interactive access to various browser capabilities, e.g., set a URL, extract the source text of a Web page, browse backwards and forwards... The HTML building block is now available to future rtForth applications, and by adding a pwBasic
wrapper it is available to pwBasic
applications as well.
The current rtForth has more than 1500 building blocks, all in one executable, and is scriptable.
No, it is not a massive executable. Less than 3 MByte, a single file, and it runs on a noob
Win 10 or Win 11 installation.
Even the pwBasic
executable, which includes rtForth
, is less than 4 MByte.
The Bottom Line
I code in C because it forces me to examine nitty-gritty issues. I avoid frameworks, which I admit are fantastic if the goal is an end program
, but not so fullfilling if the goal is the journey
.
Okay, I do not try to replace the Windows OS with my own, but I have written real-time executives for multi-processor platforms.
And I do not write my own code to render JPG files to bitmaps, but it does interest me so maybe one day I will make that journey.
I also code in plain old C because I have an investment in it of >93,000 C language source code lines in my rtForth which is the sum of many many many happy journeys.
Does Plain Old C Limit Me
Sometimes, but not usually for long.
I wanted to be able to dynamically create a GUI, so I added low-level message handling and wrote my own widget handling (no MFC, or other framework) I admit that this was not easy, but has been immensely useful.
I wanted to access COM
objects and COM
servers (e.g Microsoft Excel), so I wrote a (surprisingly small) building block to use APIs and interfaces such as CoCreateInstance
and IDispatch
.
I even wrote a mini module to call code in a .NET DLL/executable.
As we shall see later in this article, I found a way to mix C/C++ in a fundamentally plain old C application.
So please, do not berate me for using plain old C or for using my versions of Forth and Basic .
No apologies. Sometimes I think that I use these languages for historic reasons because we dinosaurs should stick together.
The glove fits the hand!
Image Comparison: What Drives My Interest
I used to be a keen, although not good, analogue photographer, so I have quite a few analogue images (printed images, slides, film strips: colour and b&w, 35mm and 6x6), and of course some images inherited from my family, and from my wife's family. All in all about 5000 analogue sources, which together with images from digital cameras means I have more than 30,000 images.
As a retirement activity, I decided to digitise the analogue images. Not a trivial task, and possibly worthy of its own article of tips and tricks.
I thought that it would be nice to provide metadata to the images so that my kids could look at a picture and identify their great-great grandmother in a picture from ~1909.
Writing on the back of paper pictures is not the solution. Ultimately, the metadata will probably end up embedded in the digital file as an EXIF or XMP tag (or whatever).
I decided that the most flexible solution (for my perceived needs) would be to get metadata in to a central databank, so that as-and-when a convenient image-file-tagging solution comes along, I could simply export the metadata from the databank to a tagging tool and let it insert the tag into the image file. This decision, to use a database, might require justification. It is simply this, bulk tagging tools need to be given the tagging information in a particular format (CSV, maybe Excel, or JSON, or ..). I do not yet know what format I will need, nor indeed what tag fields I might want. So a database with the ability to export its data to CSV, or Excel or JSON, or have custom export routines created for it, seems to me to be a sensible approach.
Yes, I know that there are metadata tagging tools (such as exiftool et al), but for now, I want the flexibility of a database. And which program do I know of with a flexible scripting capability and a built-in database? Tra la yes, pwBasic
!
10,000 script lines later, I had a flexible side-along metadata management system with extra support for image files. Here is a screenshot of one of the GUI interfaces to the metadata management system.
I have done a proof-of-concept using a script (in pwBasic
) exporting the metadata from SQLite to a CSV file and then triggering exiftool to do write the metadata in to tag fields in the image file.
Cold Reality
Regrettably, over the years, some analogue images were scanned/digitised multiple times resulting in almost identical copies on disk. Due to migrations from old (small) backup disks to a larger centralised storage system, I also ended up with duplicate copies of some image files.
It was easy to identify duplicate files using a MD5 checksum. However, the almost identical images were (are) a bit trickier.
But then about three weeks ago, I heard about some digital-image comparison techniques such as:
- average hash
- perceptual hash
- difference hash
- wavelet hash
I have a limited understanding of Fourier Transforms, Fast Fourier Transforms and Discrete Fourier Transforms, so I decide to try out the perceptual hash technique which uses a Discrete Cosine Transform (DCT).
Do not be put off by the mathematics. The coding of a DCT is genuinely easy (source code is provided below) and does not need any parameter tuning.
But in the last three weeks, I have banged my head against a few issues that I would like to share here.
Steps to Create a Perceptual Hash
First, digitize the analogue images. I will not elaborate on this here, it deserves its own article.
I placed all of my image files in a folder tree on my hard-disk (a solid-state device for performance) and scanned the tree for file names and popped them in to a SQLite database. There are advantages to be had in the naming and structuring of the folder tree that can simplify the metadata creation, but that is not really relevant to this article.
I then submitted each image file referenced (as a path/name) in the database to a C function inside my pwBasic
executable. The C function does the following:
- read/render the file into a bitmap (I tried both the
OleLoadPicturePath
/IPICTURE API
and the gdi+ API
) - compress the bitmap to a 32x32 bitmap (using
StretchBlt
) - convert the 32x32 bitmap from colour to 8 bit grayscale (trivial loop in a loop, but interesting due to human visual perception)
- execute a Discrete Cosine Transform on the 32x32 grayscale bitmap creating a 32x32 array of floating point values
- extract a specific 8x8 sub-array from the 32x32 DCT array of values
- get the average value of the 8x8 sub-array
- replace the values in the 8x8 array with a
1
if an array element's value is > the average, else replace the value by a 0
- create a 64 bit value where the nth bit has the same value
1
or 0
as the nth element of the 8x8 array - the C function then returns the 64 bit value which is the perceptual hash
The 64 bit value is then added to the database entry for the relevant image.
Perceptual hash created. Job done, er, that is except for the details.
Reading/Rendering an Image File to a Bitmap
The perceptual hash algorithm needs to see the image as a matrix of grayscale pixel values. But a typical image file (such as a JPG) holds the pixel values inside a compressed, complex, data structure. Fortunately, there are onboard
APIs in a noob
Windows system that can convert an image file in to a simple bitmap. There are, of course, third-party libraries that can do this as well, if not better! But it is a goal of mine to target noob
Windows systems.
Initially, I read in an image file using OleLoadPicturePath
to get the image in to an IPicture
structure. And then used the IPicture
render method to convert the IPicture
structure into a bitmap. OleLoadPicturePath
and IPicture
are designed for COM
objects. But with a little bit of effort, you can call them from plain old C.
TCHAR * OlePathName ;
fth_to_zero () ;
OlePathName = pop_ptr () ;
IPicture * Ipic = NULL;
HRESULT hr;
hr = OleLoadPicturePath
( OlePathName
, NULL
, 0
, 0
, & IID_IPicture
, (void *)(&Ipic)
) ;
if (hr)
{ push_ucell (0) ;
return ;
}
HDC hDC = GetDC(NULL);
const int HIMETRIC_PER_INCH = 2540;
int nPixelsPerInchX = GetDeviceCaps(hDC, LOGPIXELSX);
int nPixelsPerInchY = GetDeviceCaps(hDC, LOGPIXELSY);
SIZE sizeInHiMetric;
Ipic->lpVtbl->get_Width (Ipic,&sizeInHiMetric.cx);
Ipic->lpVtbl->get_Height(Ipic,&sizeInHiMetric.cy);
myUInt targetWidth = (nPixelsPerInchX * sizeInHiMetric.cx +
HIMETRIC_PER_INCH / 2) / HIMETRIC_PER_INCH;
myUInt targetHeight = (nPixelsPerInchY * sizeInHiMetric.cy +
HIMETRIC_PER_INCH / 2) / HIMETRIC_PER_INCH;
canvasHANDLE * pDstCanvasHandle = int_canvas_create_dib
( targetWidth , targetHeight ) ;
if ( ! pDstCanvasHandle )
{ hr = Ipic->lpVtbl-> Release( Ipic ) ;
fth_zero () ;
return ;
}
hr = Ipic->lpVtbl->Render
( Ipic , pDstCanvasHandle->dc , 0 , 0
, targetWidth , targetHeight
, 0 , sizeInHiMetric.cy , sizeInHiMetric.cx
, -sizeInHiMetric.cy
, NULL
) ;
hr = Ipic->lpVtbl-> Release( Ipic ) ;
As part of another of my rtForth
/pwBasic
projects, I implemented an image reader/renderer using gdi+
APIs which I believe use hardware acceleration.
Again, this set of APIs exists for a COM
environment. Whilst not particularly difficult, it needed more code than the OleLoadPicturePath
and IPicture
approach. It needed:
GdiplusStartup
: to set up the gdi+
environment GdipCreateBitmapFromFile
: to read the file to a gdi+
bitmap GdipGetPropertyItemSize
: to get the dimensions of the gdi+
bitmap. For transferring to a standard bitmap GdipCreateHBITMAPFromBitmap
: to convert the gdi+
bitmap to a standard bitmap GdipDisposeImage
: to release the gdi+
image gdipStop
: to release the gdi+
environment
Additionally, I had the issue that there does not seem to be header support for gdi+
in plain old C. So, I had to find/write functional prototypes for the gdi+
interfaces that I needed, and hand-code some data struct
s and enum
s. I learnt about using gdi+
from a stackoverflow discussion, specifically: gdi+ related article.
Again, no apologies, this time for the coding style.
Here is the gdi+
code that I use.
#pragma comment(lib, "gdiplus.lib")
struct GdiplusStartupInputTYPE
{
UINT32 GdiplusVersion;
UINT_PTR DebugEventCallback;
BOOL SuppressBackgroundThread;
BOOL SuppressExternalCodecs;
VOID * fp ;
} ;
typedef struct GdiplusStartupInputTYPE GdiplusStartupInput ;
typedef struct
{
DWORD id;
ULONG length;
WORD type;
VOID* value;
} PropertyItem;
typedef enum
{
RotateNoneFlipNone = 0,
Rotate90FlipNone = 1,
Rotate180FlipNone = 2,
Rotate270FlipNone = 3,
RotateNoneFlipX = 4,
Rotate90FlipX = 5,
Rotate180FlipX = 6,
Rotate270FlipX = 7,
RotateNoneFlipY = Rotate180FlipX,
Rotate90FlipY = Rotate270FlipX,
Rotate180FlipY = RotateNoneFlipX,
Rotate270FlipY = Rotate90FlipX,
RotateNoneFlipXY = Rotate180FlipNone,
Rotate90FlipXY = Rotate270FlipNone,
Rotate180FlipXY = RotateNoneFlipNone,
Rotate270FlipXY = Rotate90FlipNone
} RotateFlipType;
int GdiplusStartup(ULONG_PTR *token,
struct GdiplusStartupInput *gstart, struct GdiplusStartupInput *gstartout);
int GdiplusShutdown(ULONG_PTR token);
int GdipSaveImageToFile
(void *GpBitmap, WCHAR *filename, CLSID *encoder, void *params);
int GdipDisposeImage(void *GpBitmap);
int GdipCreateBitmapFromHBITMAP(HBITMAP bm, HPALETTE hpal, void *GpBitmap);
int GdipCreateBitmapFromFile(WCHAR *filename, void **GpBitmap);
int GdipGetPropertyItem (void *GpBitmap, PROPID propID, UINT size, void *buffer);
int GdipGetPropertyItemSize (void *GpBitmap, PROPID propID, UINT *size);
int GdipImageRotateFlip (void *GpBitmap , RotateFlipType type) ;
ULONG_PTR gdiplusToken;
uint64_t gdipStatus = 0 ;
uint64_t gdipStart ()
{
if ( gdipStatus ) return gdipStatus;
GdiplusStartupInput gdiplusStartupInput;
gdiplusStartupInput.GdiplusVersion = 1;
gdiplusStartupInput.DebugEventCallback = NULL;
gdiplusStartupInput.SuppressBackgroundThread = 0;
gdiplusStartupInput.SuppressExternalCodecs = 0;
gdiplusStartupInput. fp = 0;
if ( GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL) )
{
return 0 ;
}
gdipStatus = 1;
return gdipStatus ;
}
void gdipStop ()
{
static uint64_t status = 0 ;
if ( ! gdipStatus ) return ;
GdiplusShutdown(gdiplusToken);
gdipStatus = 0;
}
void int_create_bitmap_from_file_GDIP ()
{
TCHAR * FileName ;
fth_to_zero () ;
FileName = pop_ptr () ;
int retStatus;
void * GpBitmap = NULL ;
if ( ! gdipStart () )
{
fth_zero () ;
return ;
}
retStatus = GdipCreateBitmapFromFile ( FileName , & GpBitmap ) ;
PropertyItem * itemProperties ;
myUInt wantedProperty ;
myUInt propertySize ;
unsigned short ExifOrientation;
ExifOrientation = 1 ; unsigned short gdipOrientation = 0 ; wantedProperty = 0x0112 ; propertySize = 0 ;
retStatus = GdipGetPropertyItemSize ( GpBitmap, wantedProperty, & propertySize );
if ( retStatus EQU 0 )
{
itemProperties = (PropertyItem *)malloc(propertySize);
retStatus = GdipGetPropertyItem
( GpBitmap, wantedProperty, propertySize, itemProperties);
ExifOrientation = *( unsigned short * ) ( itemProperties->value ) ;
free ( itemProperties ) ;
switch (ExifOrientation )
{
case 1: gdipOrientation = RotateNoneFlipNone ;
break ;
case 2: gdipOrientation = RotateNoneFlipX ;
break ;
case 3: gdipOrientation = Rotate180FlipNone ;
break ;
case 4: gdipOrientation = Rotate180FlipX ;
break ;
case 5: gdipOrientation = Rotate90FlipX ;
break ;
case 6: gdipOrientation = Rotate90FlipNone ;
break ;
case 7: gdipOrientation = Rotate270FlipX ;
break ;
case 8: gdipOrientation = Rotate270FlipNone ;
break ;
}
retStatus = GdipImageRotateFlip (GpBitmap , gdipOrientation) ;
}
HBITMAP hBitmap = NULL;
retStatus = 0 ;
if ( retStatus EQU 0 )
{
retStatus = GdipCreateHBITMAPFromBitmap( GpBitmap, & hBitmap, 0) ;
}
if ( GpBitmap NEQU NULL )
GdipDisposeImage(GpBitmap); gdipStop () ;
push_ptr (hBitmap ); }
What was nice, was that gdi+
is supposed to use some hardware acceleration. I do not know if it did, but on my laptop testing against my entire image file collection, the gdi+
appeared to be almost twice as fast as the OleLoadPicturePath
and IPicture
approach.
Image Compression, or the Issue of Bitmaps and Halftones
Sorry, but as stated earlier, I cannot explain why/how perceptual hashing works, but I have read that it is okay to compress the image to a 32 by 32 grayscale pixel image and then do the DCT on the 32 by 32 grayscale image. Because a DCT is fundamentally a matrix calculation, it is computationally cheaper to do a DCT on a 32 by 32 matrix than on the original image's bitmap matrix which can easily be thousands by thousands. But as we reduce the matrix size, we lose accuracy/definition. "They", whoever "they" may be, say that 32 by 32 is okay!
It was at this point that I really had issues.
Whilst checking my implementation against what I considered to be near-identical images, the following image pair had a low similarity. Surely, I thought, they will have very similar perceptual hash values. Wrong! I only measured a 64% similarity, i.e., their perceptual hash values differed by 23 bits out of 64.
This led me to believe that something serious was wrong with my code. I banged and banged my head against the proverbial brick wall. I slept badly. Then I wrote a pwBasic
script to compare some bitmaps side-by-side. Then I broke down the steps: read/render, compress, grayscale, DCT in to blocks and displayed the bitmaps at various points in the process.
If you read/render the image file to a bitmap (say using the int_create_bitmap_from_file_GDIP
function presented earlier), and then simply throw the bitmap at the screen it will appear somewhat weird if the bitmap needs to be reduced to fit on the screen.
In Windows techie talk: respond to the WM_DRAWITEM
message by StretchBlt
ing the source bitmap in to the device context supplied by the WM_DRAWITEM
message.
rCode = StretchBlt
( destinationDeviceContext
, 0
, 0
, 32
, 32
, sourceDeviceContext
, 0
, 0
, widthOfSourceBitmap
, heigthOfSourceBitmap
, SRCCOPY
) ;
For the left hand image of the castle (above): this on the left (below) is what you (probably) wanted, but on the right is what you get if you simply throw the bitmap at the screen!
For the right hand image of the castle: this (below) on the left is what you (probably) wanted, but on the right is what you get if you simply throw the bitmap at the screen!
So What, Who Cares
Well, it turns out that I do, I think!
I eventually realised that the bitmap compression from a full-size image bitmap to the 32x32 bitmap needed for the DCT calculation, did not handle halftones.
This (below) is what the (without halftone handling) chain of code was comparing, revealing only a 64% similarity, i.e., their perceptual hash values differed by 23 bits out of 64.
When I used a halftone correction (if that is the correct phrase?) I had a much better 79% similarity, i.e., their perceptual hash values differed by 12 bits out of 64.
The castle image comparison with halftone correction is shown below:
With or Without Halftone Correction
There is a British saying that one Swallow does not make a summer (Swallow: a migratory bird: the "British" Swallow spends winter in Africa and returns to Britain in the summer).
So I tested numerous pairs of images with and without halftone correction. In general, it appears to me to be better to use halftone correction.
All one needs to do is add one line of code SetStretchBltMode
immediately before the call to the StretchBlt
that compresses the full-size image bitmap to the 32 by 32 pixel bitmap that is used in the Discrete Cosine Transformation.
Actually, Microsoft documentation says add two lines SetStretchBltMode
and SetBrushOrgEx
.
SetStretchBltMode(destinationDeviceContext HALFTONE); SetBrushOrgEx(destinationDeviceContext , 0, 0, NULL); rCode = StretchBlt
( destinationDeviceContext
, 0
, 0
, 32
, 32
, sourceDeviceContext
, 0
, 0
, widthOfSourceBitmap
, heigthOfSourceBitmap
, SRCCOPY
) ;
Turning Things Around
Later on, in the 3 weeks that I spent developing the approach, I noticed that some near-identical images had widely different perceptual hash values, this was mainly where the images had been scanned with different orientations in the scanner. Scanned images do not have the correct value in the orientation tag (if they have such a tag at all), since they can not know the images' orientation.
Somehow, I had thought that the DCT algorithm would be orientation independent. Evidently, I thought wrong!
This means that image orientation is relevant to image comparison based on perceptual hashes ( at least as implemented in my code).
Just for fun/reference, I present the first castle image compared with a 90-degree clockwise version of itself, their perceptual hash values differed by 36 bits out of 64 so they have a similarity of 36%.
So, I decided to add in some code to do image orientation correction.
The gdi+
approach required two extra gdi+
methods (they are in a code snippet above):
GdipGetPropertyItem
: to get the image orientation GdipImageRotateFlip
: to get the image "upright"
The OleLoadPicturePath
and IPicture
approach required a bit (English understatement!) more code.
- If your programming environment supports it, make scripting calls to the
GetDetailsOf
API of the Shell.Application - or, if your programming environment supports it, make scripting calls to the Windows Image Acquisition
- or use Windows Imaging Component to query the metadata properties embedded in a typical JPG file and thus get the orientation.
WIC is designed for COM
interfaces, a minor issue in plain old C. Whatever, I coded this solution.
Then, once you have got the orientation, rotate the image as appropriate. The rotation code is a not difficult but it took me quite a while to get the logic correct in the lines marked ??? see below.
uTYPE32 * srcPtr , * srcDibData = pSrcCanvasHandle->data; uTYPE32 * dstPtr , * dstDibData = pCanvasHandleDst->data; int srcStride = pSrcCanvasHandle->stride / sizeof(uTYPE32) ; int dstStride = pCanvasHandleDst->stride / sizeof(uTYPE32) ; int row, col
srcPtr = srcDibData ;
dstPtr = dstDibData ;
switch ( rotation )
{
case 0:
for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
{
for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
{
*dstPtr++ = *srcPtr++ ; }
}
break ;
case 90:
for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
{
for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
{
dstPtr [ (dstStride*(pSrcCanvasHandle->max_w - col-1) )
+ (row) ] = *srcPtr++ ; }
}
break;
case 180:
for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
{
dstPtr = dstDibData +
(pSrcCanvasHandle->max_h - row -0)*dstStride ; for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
{
*--dstPtr = *srcPtr++ ; }
}
break ;
case 270:
for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
{
for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
{
dstPtr [ (dstStride*col ) +
(pSrcCanvasHandle->max_h - row -1) ] = *srcPtr++ ; }
}
break;
default:
break;
}
Or and this is the clever bit. Forget the orientation correction approach using OleLoadPicturePath
and IPicture
, stay with the gdi+
approach, it is simply faster (and easier).
Converting the Compressed RGB Bitmap to Grayscale
One thing that always bugs me a bit is English or American English spelling. I am British by birth, with dual nationality (German). But frankly my hair is grey, not gray. Which is a natural colour (not color) for an aged being (I wrote my first program in 1971 in the CESIL programming language).
Grey or gray, whatever. The code I used for the grayscale conversion was:
unsigned char * dibData = pReducedCanvasHandle->data; unsigned char * dptr ;
double grayArray [ 32 * 32 ] ;
double * dblPtr ;
dblPtr = grayArray ;
for ( row = 0 ; row < 32 ; row++ )
{
dptr = dibData + ( row * ( pReducedCanvasHandle->stride ) ) ; for ( col = 0; col < 32 ; col ++ )
{
*dblPtr++ = ( 0.072 * dptr[0] ) +
( 0.715 * dptr[1] ) +
( 0.213 * dptr[3] ) ;
dptr += sizeof(uTYPE32) ;
}
}
Discrete Cosine Transformation
I shall not attempt to explain the principles of DCTs, for one good reason. My digital signal processing knowledge has rusted over the last 42 years to the point of being worthless. Fifteen years ago I had a go at Discrete Fourier Transforms to create (just for fun) a waterfall display of audio signals, and tinkered with a Goertzel filter as part of a Morse-code decoder. But sorry, I cannot even start to explain any of it!
Using the DCT proved far easier than I expected. Code for the DCT is easily found online, so I took a copy and worked a bit on optimising it.
double dct (double *DCTMatrix, double *ImgMatrix, int N, int M)
{
int i, j, u, v;
int cnt = 0 ;
double DCTsum = 0.0 ;
for (u = 0; u < N; ++u)
{
for (v = 0; v < M; ++v)
{
DCTMatrix[(u*N) + v] = 0;
for (i = 0; i < N; i++)
{
for (j = 0; j < M; j++)
{
DCTMatrix[(u*N) + v] += ImgMatrix[(i*N) + j]
* cos(M_PI/((double)N)*(i+1./2.)*u)
* cos(M_PI/((double)M)*(j+1./2.)*v);
cnt=cnt++ ;
}
}
DCTsum += DCTMatrix[(u*N) + v] ;
}
}
DCTsum -= DCTMatrix[0] ;
return DCTsum / cnt ;
}
I ran performance tests on this with image data, 1000 iterations took 12.7 seconds. Each image required that the DCT loop was executed 1048576 times (32*32*32*32).
There is significant room for improvement: both in understandability and performance. Of extreme relevance is that although the DCT is 32 by 32, we only need the 8 by 8 sub matrix (row 0:7, column 0:7). It is also possible to replace the expensive cos
(cosine) calls by simple look-ups in to a predefined table.
After a few optimisation attempts, I had the following:
int CoeffFlag = 0 ;
double Coeff [32][32] ;
double dctOptimised (double *DCTMatrix, double *IMGMatrix )
{
#define PIover32 0.09817477042
#define RowMax 32
#define ColMax 32
int imgRow, imgCol ;
int dctRow, dctCol ;
int x , y ;
int cnt = 0 ;
double DCTsum = 0.0 ;
double dctRCvalue = 0.0 ;
if (! CoeffFlag )
{
for (x = 0 ; x < 32 ; x++ )
{
for (y = 0 ; y < 32 ;y++)
{
Coeff[x][y] = cos ( PIover32 * ( x + 0.5 ) * y ) ;
}
}
CoeffFlag = 1;
}
for (dctRow = 0; dctRow < 8; dctRow++)
{
for (dctCol = 0; dctCol < 8; dctCol++)
{
dctRCvalue = 0;
for (imgRow = 0; imgRow < RowMax; imgRow++)
{
for (imgCol = 0; imgCol < ColMax; imgCol++)
{
dctRCvalue+= IMGMatrix[(imgRow * RowMax) + imgCol]
* Coeff[imgRow][dctRow] * Coeff[imgCol][dctCol] ; cnt=cnt++ ;
}
}
DCTMatrix[(dctRow*RowMax) + dctCol] = dctRCvalue;
DCTsum += dctRCvalue;
}
}
DCTsum -= DCTMatrix[0] ;
return DCTsum / cnt ;
}
On my laptop, 1000 iterations of the optimised DCT took 71 ms i.e. 0.071 of a second. Compared to the original version, this is about 150 times faster. So 30,000 unoptimised DCTs would take about 390 seconds (5.5 minutes), as compared to 21 seconds with the optimised DCT. Each image required that the DCT loop was executed 65536 times (32*32*8*8).
So over 5 minutes saved for +30,000 images, not much, but every bit helps!
That was good news, but the real expense seems to be reading/rendering the image file to a bitmap. I have not made any measurements to verify this assumption.
Reading/Rendering Performance Boost from Direct2D
Well, there might be a performance boost using Direct2D. Maybe! I do not know (yet).
I have looked at using the Direct2D
APIs, these are supposed to be fast. But the latest Microsoft Windows SDKs have the unfortunate characteristic that the Direct2D
headers do not support plain old C. And it seems that some of the Direct2D
interface methods are difficult to map to plain old C. Beats me how/why, but that's how it seems to be! The way to get around it is to write the code calling Direct2D
in a C++ environment. It does not strictly need object orientation and classes, just COM
.
I dread the thought of porting my code from plain old C to C++. Not least because I have in the >93,000 lines of plain old C source about 1500 function calls that will require attention before they will compile in C++.
Online searches reveal that you can mix plain old C and C++, but:
- Calling functions across the C/C++ boundary requires extra care in the functional prototypes
- The
WinMain
code should be implemented in C++ to ensure linking to the correct C++ libraries.
Well it is not necessarily so! At least not if you only need C++ and COM
and not need object classes.
I wrote my Direct2D
code in C++ and let the C++ code see the C header files for the rest of rtForth inside a simple construct:
#include "f_d2d1.hpp" // C++ header file for the Direct2D code
#include < iostream> // SORRY, remove the space after the less-than sign.
#include < stdio.h> // SORRY, remove the space after the less-than sign.
#include < stdlib.h> // SORRY, remove the space after the less-than sign.
#include < string.h> // SORRY, remove the space after the less-than sign.
#include < Windows.h> // SORRY, remove the space after the less-than sign.
#include < d2d1.h> // SORRY, remove the space after the less-than sign.
#include < wincodec.h> // SORRY, remove the space after the less-than sign.
extern "C" {
#include "f_protos.h" // header file for those plain "C" functions that
}
#pragma comment(lib, "d2d1.lib")
#pragma comment(lib, "Windowscodecs.lib")
Write the functionality that calls Direct2D in a C++ file. Then when a plain old C function needs to call a function in the C++ file code you will need to include a header file to the plain old C source that contains functional prototypes for the required C++ functions. So, for example
#include "f_d2d1.hpp" // "C" header file for "C" source files,
In the f_d2d1.hpp file, the "C++" functional prototypes are provide in the following way:
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
void fth_d2d1_main(); #ifdef __cplusplus
}
#endif
Finishing It Off
To be cheeky, I could finish after the following code snippet. But there are a few follow on issues.
int64_t hash_matrix ( double * matrixPtr, double averageValue )
{
int row, col ;
int64_t phash ;
for ( row = 0 ; row < 8 ; row++ )
{
for ( col = 0; col < 8 ; col++ )
{
if ( averageValue < matrixPtr[(row*32) + col ] )
phash = (phash<<1) + 1;
else
phash = phash<<1;
}
}
return phash ;
}
double avg8x8 ;
double * DCTMatrix [32 * 32 ] ; avg8x8 = computeDTC ( DCTMatrix, grayArray , 32, 32 ) ; return hash_matrix ( DCTMatrix, avg8x8 ) ) ;
Follow on Issues
Define Similar
If you want to get an indication of how similar two images are, then get their 64 bit perceptual hash values, do an exclusive bitwise or of the two values, then simply count the number of bits set in the XOR result (called the Hamming distance). My experience (admittedly limited to the last few weeks) is that a value of 12 or less is indicative of similarity! (As shown in the castle comparison earlier in this article.)
If you search for caltech 101 images dataset, you should find a zip file with a set of some 9000 (small) images with varying degrees of similarity. Here is a link to it: Caltech 101 images dataset I downloaded their images and wrote a pwBasic script (less than 200 lines) so that I could drag-n-drop images on to one of two regions on the screen and then create the perceptual hash for the two images. Display the perceptual hashes as 8x8 matrix, show the Hamming distance between the two images and convert that Hamming distance to a percentage similarity.
What made me a bit sad, is that I have not found online a set of reference images complete with their perceptual hash values. I am just unsure enough of my own work to want to check it against a recognised reference. Possibly, I have misunderstood what the Caltech 101 images dataset offers since in addition to the images the zip contains information in MATLAB files which are meaningless to me.
Consider using a Burkhard Keller Tree
Even if you have your images referenced, with their perceptual hash value, in a database, the problems are not over. An off-the-shelf RDBMS will (probably) not be too performant if you ask it to find, for each of the 30,000+ entries its nearest neighbour.
So, I implemented a Burkhard Keller tree extension for the SQLite database that I use in pwBasic. Even through the code for inserting values in to the BK-tree was not large (~60 lines of plain old C code), and finding the closest neighours was ~80 lines, it was a real pain to get the correct logic. I might make an article of it (I think I can even explain how/why BK-trees work).
But the reward was worth the frustration of implementing the BK-tree.
Performance: Perceptual Hash and BK-tree
Scanning the 30000+ images, a total of ~82 GBytes, to create a database with filepath/names and a perceptual hash value per file took 61 minutes (my laptop is a 6 year old i7-7600U CPU @ 2.80GHz, with the code written without threads). Then, a call to the SQLite Burkhard Keller tree extension to get the closest neighbour for each/all of the images took another 30 seconds.
Purely as an observation. My son has a brand new (1 week old) laptop with an AMD Ryzen Pro 7 CPU, we copied (no installation needed) the image files and pwBasic executable and script to his laptop and reran the tests. It must be emphasised that pwBasic was written for a single core, i.e., single thread, so we did not expect too much processing performance improvement, both laptops have NVME solid state disks so we did not expect any disk access advantages. My son's laptop took 44 minutes (mine took 61 minutes). My son has since then convinced me to take a serious look at spawning some threads to spread out the image rendering load across CPU cores. (Sounds a bit modern to me.)
Next Steps
I am integrating use of the perceptual hash code and Burkhard Keller tree code in to my pwBasic metadata management system script, the plain old C code is already in rtForth and wrapped by pwBasic, I just need to decide how best to use it in the script.
Later on when the pwBasic
metadata management system is truly stable, I will try out Direct2D
and C++, maybe even port other bits of my plain old C code into C++. Whilst I find the current performance acceptable for the perceptual hash calculation, I would like a more performant image rendering for a planned metadata management system enhancement.
Oh yeah, I shall think about multi-threading and if/how it can fit into rtForth / pwBasic but I have doubts.
I will place the SQLite Burkhard Keller tree extension source code in the public domain, initially on my web site (https://www.ripetech.com) or (https://www.plodware.com technically a subdomain of RipeTech)
When I am happy, I will release the pwBasic metadata management system on my web site.
The Prologue
As the author, Douglas Adams wrote in his wonderful work HHGTTG, the Sirius Cybernetics Corporation Complaints Division had the following highly appropriate (?) motto.
share and enjoy
I would like to quote an old Chinese proverb: better to light a candle than to curse the darkness
I hope my candle, this article, gives a little light.
In aller Demut.
GeorgeDS
History
- 4th June, 2023: Initial post