Perceptual Hash based Image Comparison: Coded in plain old C. An Anecdotal View

GeorgeDS

5.00/5 (15 votes)

4 Jun 2023CPOL28 min read

7.3K

An anecdotal report describing issues encountered in coding a minimal perceptual hash capability, in plain old C, in to 35 year old rolling development, with the goal of running on a noob Win 10/11 installation.

This article is concerned with some practical issues that encountered using a perceptual hash approach to image comparison.

Article thumbnail

Introduction

First and foremost: I really mean plain old C in fact K&R C. Why? Discussed later.

Secondly: I do not have a deep enough understanding of image comparison to attempt to explain how image comparison works.

Thirdly: I have got image comparison (or something that looks like it) working. (Oh dear, there is a terrible pun there):

Fourthly: I shall highlight some of the findings that I made in my quest to identify near-duplicate images. The findings include:

OleLoadPicturePath and IPicture is easy in plain old C but not too performant
using gdi+, intended for C++ and COM, in plain old C is do-able, but a bit more work, and seems to be more performant than OleLoadPicturePath and IPicture
Discrete Cosine Transformation, is fundamentally difficult, but very very easy to utilise in the context of this article
The default Discrete Cosine Transformation function can be massively accelerated for this specify use case, by a factor of >150
Use of halftone correction when compressing bitmaps needs to be considered
With the approach described in this article, a rotated image is not a near duplicate of the original image ;-(
In my specific case concerning 30,000+ images, a database helped immensely
and a Burkhard Keller tree helped me even more

Along the way, you might notice not quite a rant, but certainly a loud mumble, about the benefits to a solo programmer of the combination of plain old C, RPN post-fix Forth-like languages, and another golden oldie: Basic.

Although I suspect that I have > 10,000 hours of programming experience, I am not a code guru, so everything in this article must be qualified as being AFAIK (which might not be very far).

Links and Disclaimer

My understanding of perceptual hashes comes from on-line searches. Here are some of the sites that I used to gain my limited understanding (limited by my abilities, not by the information sources)

I have no association with the following sites, nor with their authors, nor with any intermediatory person or legal entity. I provide this list of some of the sites that I visited simply in case you want a kick-start in relevant sites to visit. At the time of writing this article, the links were valid, and I am not aware of any malicious or detrimental effect arising from browsing the specified links.

Words of Thanks, and Acknowledgments

Using plain old C is okay but at some point, in a Windows environment, you have to face using the Component Object Model (COM). I got my kick-start for using COM in plain old C from articles in CodeProject. Try COM-in-plain-C by Jeff Glatt 2006 (my gratitude!).

I got my kick-start for using gdi+ from a stackoverflow discussion, specifically: gdi+ related article

I would like to take this opportunity to sing the praises of the people that maintain SQLite. Quite simply, a wonderful piece of work. Without SQLite, I think that my career since 2014 would have been less enjoyable. I am immensely jealous, and exceedingly grateful.

I refer in this article to a metadata tagging and browsing tool, I believe there are numerous such tools, but my experience (very limited) is with exiftool. It was whilst creating my own metadata management tool that I stumbled upon exiftool. And it was that metadata management tool that brought me to perceptual hashing.

Why Plain Old C

I have no problems with the enhancements to K&R C, what I do wish to imply is: I do not use C++, neither for object orientated programming, nor (and this is the important bit) for imperative programming.

Why? Well C permits the copying of pointer values from one pointer type to another without any explicit casting expression. Due to my use (explained later) of a Forth-like virtual machine, I (possibly foolishly) have made extensive of such pointer copying. Such code does not compile in a C++ environment. Mea culpa. But as discussed later around the issue of Direct2D, I might have found a gentle migration path to C++.

I have been programming for over 52 years, 42 of those years professionally. When programming for myself, I take immense pleasure in the journey and less pleasure in the destination. In programming terms, this means the understanding and achieving of a functionality is more important (for me) than using the end program. If I wanted a program with a GUI, I could use one of the GUI frameworks, or I could write my own at the (in Windows) message handling level. Guess which approach I use!

I started working in programming back in 1979 using hardware-near languages: assembler for various CPUs: Intel i80/i85/i86..i486, Motorola 68000, Sun Sparc, and "higher-level" hardware-near languages such as C, PL/M and CHILL.

Along the way, I looked in on (but did not stay with): Fortran, Algol, Pascal, Smalltalk, C++ (for object orientation) and C#. In 1972, Basic caught my eye as a fun language, and in 1988, Forth shocked me because of its simplicity and potential. To be honest, I have never given Lisp a chance, my fault not Lisp's. For Web programming, I will happily use PHP. Python, Haskel, Rust, etc. etc. etc. came along too late for me.

I like writing blocks of functionality and having an interactive environment in which I can throw the blocks together to make a solution. It is a lot like the original Lego concept. Create a limited set of basic building blocks and only create a custom block if it cannot be simulated by a collection of basic blocks.

Since 1988, under the influence of Forth, I have been following this principle. I code my building blocks in C, and drop the blocks into a single executable, the executable includes a reverse Polish notation post-fix interpreter along the lines of a traditional Forth system. Building blocks can then be exposed to the command-line-interpreter/user via a Forth-like wrapper.

I call the whole concept/thing/idea/whatever a rapid interactive programming environment technology, or RipeTech for short, and the set of building blocks and supporting Forth-like code when compiled is my rtForth executable.

Back to Basics

Around 2014, I had a contract on a massive project (hundreds of millions of €). The project was not part of the company's daily business. Their IT department was fully occupied on daily business for >20 million active customers. But the project demanded a lot of ad hoc data analysis. I have learnt over the years that ad hoc data analysis requests from a project that is not part of daily business is a no no for most IT departments.

Excel files (some of 900,000 lines & 800 columns, per week) were whizzing around the project, and a few luckless individuals were writing VBA macros to try to get meaning out of the data. Er, it was painful.

We could not install Access due to company policy/cost (and members' skill levels). So to improve my own work conditions, I used the SQLite source code (single file, plain C) that I had in rtForth with some wrappers around the needed SQLite functionality. SQL meant that I could massively simplify and speed up my data analysis workload. And SQLite meant that I could do it serverless.

But the unit of exchange within the project was Excel files. So I added in a Component Object Model (COM) capability at the C level, wrote some rtForth wrappers and exchanged data with Excel using Object Linking and Embedding / Automation /(COM) .

That helped me, but I believe 99.9% of people have issues with reverse Polish notation post-fix programming.

So I wrote a script, in rtForth of course with some C building blocks for performance, that overlays rtForth with an in-fix interpreter with a syntax based on Basic-like interpreters such as Microsoft Visual Basic. I embedded these scripts in the rtForth executable and ended up with a stand-alone Basic-like interpreter with built-in SQL & (COM) capability. This was a language that people with Microsoft VBA skills could easily adapt to. And it was in an executable that did not need installation so there were no issues running it on a company PC/laptop (in retrospect you might consider this, er, naughty!).

I, alone, was doing up to 1500 data analysis deliveries to the project per year, based on a weekly updated set of project data ( the 900,000 lines of 800 columns), and using SQLite / COM and scripts. I was on this project for almost 7 highly enjoyable years, living off my hobby-horse. I really was in hog heaven.

I call the Basic-like stand-alone executable pwBasic, pw stands for PlodWare the name I use for my programming efforts in Basic. I wrote my first Basic program back in 1971/72 and although I like Basic, it took me 40+ years of plodding along before I finally got to write my own!

This layering approach of functionality/language is IMHO (I am not particularly humble) highly flexible. Functionality can be added at the C level where appropriate and then wrappers written for it in rtForth or pwBasic.

Example: I wanted to provide inline help for an application, and wanted the same information to be online. So I wrote the help text (once) in HTML and dropped a HTML browser control as a building block into the rtForth environment. The HTML block required quite a few rtForth wrappers because I wanted to provide interactive access to various browser capabilities, e.g., set a URL, extract the source text of a Web page, browse backwards and forwards... The HTML building block is now available to future rtForth applications, and by adding a pwBasic wrapper it is available to pwBasic applications as well.

The current rtForth has more than 1500 building blocks, all in one executable, and is scriptable.

No, it is not a massive executable. Less than 3 MByte, a single file, and it runs on a noob Win 10 or Win 11 installation.

Even the pwBasic executable, which includes rtForth, is less than 4 MByte.

The Bottom Line

I code in C because it forces me to examine nitty-gritty issues. I avoid frameworks, which I admit are fantastic if the goal is an end program, but not so fullfilling if the goal is the journey.

Okay, I do not try to replace the Windows OS with my own, but I have written real-time executives for multi-processor platforms.

And I do not write my own code to render JPG files to bitmaps, but it does interest me so maybe one day I will make that journey.

I also code in plain old C because I have an investment in it of >93,000 C language source code lines in my rtForth which is the sum of many many many happy journeys.

Does Plain Old C Limit Me

Sometimes, but not usually for long.

I wanted to be able to dynamically create a GUI, so I added low-level message handling and wrote my own widget handling (no MFC, or other framework) I admit that this was not easy, but has been immensely useful.

I wanted to access COM objects and COM servers (e.g Microsoft Excel), so I wrote a (surprisingly small) building block to use APIs and interfaces such as CoCreateInstance and IDispatch.

I even wrote a mini module to call code in a .NET DLL/executable.

As we shall see later in this article, I found a way to mix C/C++ in a fundamentally plain old C application.

So please, do not berate me for using plain old C or for using my versions of Forth and Basic .

No apologies. Sometimes I think that I use these languages for historic reasons because we dinosaurs should stick together.

The glove fits the hand!

Image Comparison: What Drives My Interest

I used to be a keen, although not good, analogue photographer, so I have quite a few analogue images (printed images, slides, film strips: colour and b&w, 35mm and 6x6), and of course some images inherited from my family, and from my wife's family. All in all about 5000 analogue sources, which together with images from digital cameras means I have more than 30,000 images.

As a retirement activity, I decided to digitise the analogue images. Not a trivial task, and possibly worthy of its own article of tips and tricks.

I thought that it would be nice to provide metadata to the images so that my kids could look at a picture and identify their great-great grandmother in a picture from ~1909.

Writing on the back of paper pictures is not the solution. Ultimately, the metadata will probably end up embedded in the digital file as an EXIF or XMP tag (or whatever).

I decided that the most flexible solution (for my perceived needs) would be to get metadata in to a central databank, so that as-and-when a convenient image-file-tagging solution comes along, I could simply export the metadata from the databank to a tagging tool and let it insert the tag into the image file. This decision, to use a database, might require justification. It is simply this, bulk tagging tools need to be given the tagging information in a particular format (CSV, maybe Excel, or JSON, or ..). I do not yet know what format I will need, nor indeed what tag fields I might want. So a database with the ability to export its data to CSV, or Excel or JSON, or have custom export routines created for it, seems to me to be a sensible approach.

Yes, I know that there are metadata tagging tools (such as exiftool et al), but for now, I want the flexibility of a database. And which program do I know of with a flexible scripting capability and a built-in database? Tra la yes, pwBasic!

10,000 script lines later, I had a flexible side-along metadata management system with extra support for image files. Here is a screenshot of one of the GUI interfaces to the metadata management system.

I have done a proof-of-concept using a script (in pwBasic) exporting the metadata from SQLite to a CSV file and then triggering exiftool to do write the metadata in to tag fields in the image file.

Cold Reality

Regrettably, over the years, some analogue images were scanned/digitised multiple times resulting in almost identical copies on disk. Due to migrations from old (small) backup disks to a larger centralised storage system, I also ended up with duplicate copies of some image files.

It was easy to identify duplicate files using a MD5 checksum. However, the almost identical images were (are) a bit trickier.

But then about three weeks ago, I heard about some digital-image comparison techniques such as:

average hash
perceptual hash
difference hash
wavelet hash

I have a limited understanding of Fourier Transforms, Fast Fourier Transforms and Discrete Fourier Transforms, so I decide to try out the perceptual hash technique which uses a Discrete Cosine Transform (DCT).

Do not be put off by the mathematics. The coding of a DCT is genuinely easy (source code is provided below) and does not need any parameter tuning.

But in the last three weeks, I have banged my head against a few issues that I would like to share here.

Steps to Create a Perceptual Hash

First, digitize the analogue images. I will not elaborate on this here, it deserves its own article.

I placed all of my image files in a folder tree on my hard-disk (a solid-state device for performance) and scanned the tree for file names and popped them in to a SQLite database. There are advantages to be had in the naming and structuring of the folder tree that can simplify the metadata creation, but that is not really relevant to this article.

I then submitted each image file referenced (as a path/name) in the database to a C function inside my pwBasic executable. The C function does the following:

read/render the file into a bitmap (I tried both the OleLoadPicturePath/IPICTURE API and the gdi+ API )
compress the bitmap to a 32x32 bitmap (using StretchBlt)
convert the 32x32 bitmap from colour to 8 bit grayscale (trivial loop in a loop, but interesting due to human visual perception)
execute a Discrete Cosine Transform on the 32x32 grayscale bitmap creating a 32x32 array of floating point values
extract a specific 8x8 sub-array from the 32x32 DCT array of values
get the average value of the 8x8 sub-array
replace the values in the 8x8 array with a 1 if an array element's value is > the average, else replace the value by a 0
create a 64 bit value where the n^th bit has the same value 1 or 0 as the n^th element of the 8x8 array
the C function then returns the 64 bit value which is the perceptual hash

The 64 bit value is then added to the database entry for the relevant image.

Perceptual hash created. Job done, er, that is except for the details.

Reading/Rendering an Image File to a Bitmap

The perceptual hash algorithm needs to see the image as a matrix of grayscale pixel values. But a typical image file (such as a JPG) holds the pixel values inside a compressed, complex, data structure. Fortunately, there are onboard APIs in a noob Windows system that can convert an image file in to a simple bitmap. There are, of course, third-party libraries that can do this as well, if not better! But it is a goal of mine to target noob Windows systems.

Initially, I read in an image file using OleLoadPicturePath to get the image in to an IPicture structure. And then used the IPicture render method to convert the IPicture structure into a bitmap. OleLoadPicturePath and IPicture are designed for COM objects. But with a little bit of effort, you can call them from plain old C.

// plain old C
// incomplete code, provided for guidelines only
// ------------------------------------------------------------------
//  read the source IMAGE to a full size bitmap
// ------------------------------------------------------------------
// handle the filename
TCHAR * OlePathName ;

// Sorry, these two lines are  rtForth code, they simply get the filename from a
// the underlying virtual machine.
fth_to_zero () ;
OlePathName = pop_ptr () ;

IPicture * Ipic = NULL;
HRESULT hr;

hr = OleLoadPicturePath
( OlePathName
, NULL
, 0
, 0
, & IID_IPicture
, (void *)(&Ipic)
) ;

if (hr)
{  // error handling, returns a ZERO to  the underlying virtual machine
    push_ucell (0) ;
    return ;
}

// ------------------------------------------------------------------
HDC hDC = GetDC(NULL);

const int HIMETRIC_PER_INCH = 2540;

int nPixelsPerInchX = GetDeviceCaps(hDC, LOGPIXELSX);
int nPixelsPerInchY = GetDeviceCaps(hDC, LOGPIXELSY);

// get width and height of picture
SIZE   sizeInHiMetric;
Ipic->lpVtbl->get_Width (Ipic,&sizeInHiMetric.cx);
Ipic->lpVtbl->get_Height(Ipic,&sizeInHiMetric.cy);

// convert himetric to pixels
myUInt targetWidth  = (nPixelsPerInchX * sizeInHiMetric.cx +
                       HIMETRIC_PER_INCH / 2) / HIMETRIC_PER_INCH;
myUInt targetHeight = (nPixelsPerInchY * sizeInHiMetric.cy +
                       HIMETRIC_PER_INCH / 2) / HIMETRIC_PER_INCH;

// ------------------------------------------------------------------
// get a BITMAP big enough to hold the entire image
// Sorry I can not provide all of the code behind   int_canvas_create_dib

canvasHANDLE  * pDstCanvasHandle   = int_canvas_create_dib
                                     ( targetWidth , targetHeight )  ;
if ( !  pDstCanvasHandle  )
{   // error handling, release the IPicture ,
    // then returns  a ZERO to  the underlying virtual machine
    hr = Ipic->lpVtbl-> Release( Ipic ) ;
    fth_zero () ;
    return ;
}

// ------------------------------------------------------------------
// render image

hr = Ipic->lpVtbl->Render
( Ipic                      // the provider of the image pixel
, pDstCanvasHandle->dc      // device context of the Bitmap to receive
                            // the rendered image
, 0           ,   0
, targetWidth , targetHeight
, 0                         // src x
, sizeInHiMetric.cy         // src y
, sizeInHiMetric.cx
, -sizeInHiMetric.cy
, NULL
) ;

hr = Ipic->lpVtbl-> Release( Ipic ) ;  // can release the IPicture,
                                       // we only need the
                                       // bitmap in pDstCanvasHandle

As part of another of my rtForth/pwBasic projects, I implemented an image reader/renderer using gdi+ APIs which I believe use hardware acceleration.

Again, this set of APIs exists for a COM environment. Whilst not particularly difficult, it needed more code than the OleLoadPicturePath and IPicture approach. It needed:

GdiplusStartup: to set up the gdi+ environment
GdipCreateBitmapFromFile: to read the file to a gdi+ bitmap
GdipGetPropertyItemSize: to get the dimensions of the gdi+ bitmap. For transferring to a standard bitmap
GdipCreateHBITMAPFromBitmap: to convert the gdi+ bitmap to a standard bitmap
GdipDisposeImage: to release the gdi+ image
gdipStop: to release the gdi+ environment

Additionally, I had the issue that there does not seem to be header support for gdi+ in plain old C. So, I had to find/write functional prototypes for the gdi+ interfaces that I needed, and hand-code some data structs and enums. I learnt about using gdi+ from a stackoverflow discussion, specifically: gdi+ related article.

Again, no apologies, this time for the coding style.

Here is the gdi+ code that I use.

// plain old C
#pragma comment(lib, "gdiplus.lib")

struct GdiplusStartupInputTYPE
{
    UINT32     GdiplusVersion;               
    UINT_PTR   DebugEventCallback;       
    BOOL       SuppressBackgroundThread;       
    BOOL       SuppressExternalCodecs;         
    VOID * fp ;
} ;
typedef struct  GdiplusStartupInputTYPE GdiplusStartupInput ;

typedef struct
{
    DWORD	id;
    ULONG	length;
    WORD	type;
    VOID*	value;
} PropertyItem;

typedef enum
{
    RotateNoneFlipNone = 0,
    Rotate90FlipNone   = 1,
    Rotate180FlipNone  = 2,
    Rotate270FlipNone  = 3,

    RotateNoneFlipX    = 4,
    Rotate90FlipX      = 5,
    Rotate180FlipX     = 6,
    Rotate270FlipX     = 7,

    RotateNoneFlipY    = Rotate180FlipX,
    Rotate90FlipY      = Rotate270FlipX,
    Rotate180FlipY     = RotateNoneFlipX,
    Rotate270FlipY     = Rotate90FlipX,

    RotateNoneFlipXY   = Rotate180FlipNone,
    Rotate90FlipXY     = Rotate270FlipNone,
    Rotate180FlipXY    = RotateNoneFlipNone,
    Rotate270FlipXY    = Rotate90FlipNone
} RotateFlipType;

// in x64  "C"   and   CPP   have the same
// calling convention  so   __stdcall is not needed (as it is in x32)

int   GdiplusStartup(ULONG_PTR *token, 
      struct GdiplusStartupInput *gstart, struct GdiplusStartupInput *gstartout);
int   GdiplusShutdown(ULONG_PTR token);
int   GdipSaveImageToFile
      (void *GpBitmap, WCHAR *filename, CLSID *encoder, void *params);
int   GdipDisposeImage(void *GpBitmap);
int   GdipCreateBitmapFromHBITMAP(HBITMAP bm, HPALETTE hpal, void *GpBitmap);
int   GdipCreateBitmapFromFile(WCHAR *filename, void **GpBitmap);
int   GdipGetPropertyItem     (void *GpBitmap, PROPID propID, UINT size, void *buffer);
int   GdipGetPropertyItemSize (void *GpBitmap, PROPID propID, UINT *size);
int   GdipImageRotateFlip (void *GpBitmap , RotateFlipType type)  ;

ULONG_PTR gdiplusToken;
uint64_t  gdipStatus = 0 ;

uint64_t  gdipStart ()
{
    if ( gdipStatus ) return  gdipStatus;

    // get GDI+ going
    GdiplusStartupInput gdiplusStartupInput;

    gdiplusStartupInput.GdiplusVersion = 1;
    gdiplusStartupInput.DebugEventCallback = NULL;         
    gdiplusStartupInput.SuppressBackgroundThread = 0;     
    gdiplusStartupInput.SuppressExternalCodecs = 0;        
    gdiplusStartupInput. fp = 0;

    if (  GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL)  )
    {
        return 0 ;
    }
    gdipStatus = 1;
    return gdipStatus ;
}

void  gdipStop ()
{
    static uint64_t  status = 0 ;
    if ( ! gdipStatus ) return  ;
    GdiplusShutdown(gdiplusToken);
    gdipStatus = 0;
}

void int_create_bitmap_from_file_GDIP ()
{
/*
** this "C" function is embedded in the "Forth" like virtual machine
** It obtains its input arguments by popping them off the vitual machine stack
** It returns values by pushing them on to the virtual machine stack
** The code is written in 64 bit mode, with wide characters as default
*/
    // ------------------------------------------------------------------
    // handle the filename
    TCHAR * FileName ;

    // sorry, these two are calls to the "Forth" like virtual machine
    // they simply get the user provided image filename 
    // as a valid "C" string (zero terminated)
    fth_to_zero () ;         
    FileName = pop_ptr () ;   
    
    int     retStatus;
    void  * GpBitmap = NULL ;
    // ------------------------------------------------------------------
    // get GDI+ going
    if (  ! gdipStart ()   )
    {
        fth_zero () ;
        return  ;
    }
    // ------------------------------------------------------------------
    // get FILE
    retStatus =  GdipCreateBitmapFromFile ( FileName ,  & GpBitmap ) ;
    // ------------------------------------------------------------------
    // get FILE properties:
    PropertyItem *  itemProperties ;
    myUInt wantedProperty ;
    myUInt propertySize  ;
    // ------------------------------------------------------------------
    // ORIENTATION
    unsigned short ExifOrientation;
    ExifOrientation =  1 ;      // default
    unsigned short gdipOrientation =  0 ; // default
    wantedProperty =  0x0112  ; // the property id for the EXFI orientation meta-data
    propertySize = 0  ;
    retStatus =   GdipGetPropertyItemSize ( GpBitmap, wantedProperty,  & propertySize );
    if ( retStatus  EQU 0 )
    {
        itemProperties  = (PropertyItem *)malloc(propertySize);
        retStatus =    GdipGetPropertyItem     
                       ( GpBitmap, wantedProperty,   propertySize, itemProperties);
        ExifOrientation =   *( unsigned short * )  ( itemProperties->value  ) ;
        free ( itemProperties ) ;
        switch (ExifOrientation )
        {
            case 1:  //  none
                gdipOrientation = RotateNoneFlipNone ;
                break ;
            case 2:  //  flip horizontal
                gdipOrientation = RotateNoneFlipX ;
                break ;
            case 3:  //  rotate 180
                gdipOrientation = Rotate180FlipNone ;
                break ;
            case 4:  //  flip vertical
                gdipOrientation = Rotate180FlipX ;
                break ;
            case 5:  //  transpos
                gdipOrientation = Rotate90FlipX ;
                break ;
            case 6:  //  rotate 90
                gdipOrientation = Rotate90FlipNone ;
                break ;
            case 7:  //  transverse
                gdipOrientation = Rotate270FlipX ;
                break ;
            case 8:  //  rotate 270
                gdipOrientation = Rotate270FlipNone ;
                break ;
        }
        retStatus =     GdipImageRotateFlip (GpBitmap , gdipOrientation) ;
    }
    // ------------------------------------------------------------------
    // get FILE as  HBITMAP

    HBITMAP  hBitmap = NULL;
    retStatus = 0 ;
    if ( retStatus EQU 0 )
    {
        retStatus = GdipCreateHBITMAPFromBitmap( GpBitmap, & hBitmap, 0) ;
    }
    if ( GpBitmap NEQU NULL )  
         GdipDisposeImage(GpBitmap);  // destroy the 'Image' object
    gdipStop () ;

    // ------------------------------------------------------------------
    // the only variable that we carry forward is     hBitmap
    // ------------------------------------------------------------------

    push_ptr (hBitmap );  // pass to caller via virtual machine stack
}

What was nice, was that gdi+ is supposed to use some hardware acceleration. I do not know if it did, but on my laptop testing against my entire image file collection, the gdi+ appeared to be almost twice as fast as the OleLoadPicturePath and IPicture approach.

Image Compression, or the Issue of Bitmaps and Halftones

Sorry, but as stated earlier, I cannot explain why/how perceptual hashing works, but I have read that it is okay to compress the image to a 32 by 32 grayscale pixel image and then do the DCT on the 32 by 32 grayscale image. Because a DCT is fundamentally a matrix calculation, it is computationally cheaper to do a DCT on a 32 by 32 matrix than on the original image's bitmap matrix which can easily be thousands by thousands. But as we reduce the matrix size, we lose accuracy/definition. "They", whoever "they" may be, say that 32 by 32 is okay!

It was at this point that I really had issues.

Whilst checking my implementation against what I considered to be near-identical images, the following image pair had a low similarity. Surely, I thought, they will have very similar perceptual hash values. Wrong! I only measured a 64% similarity, i.e., their perceptual hash values differed by 23 bits out of 64.

This led me to believe that something serious was wrong with my code. I banged and banged my head against the proverbial brick wall. I slept badly. Then I wrote a pwBasic script to compare some bitmaps side-by-side. Then I broke down the steps: read/render, compress, grayscale, DCT in to blocks and displayed the bitmaps at various points in the process.

If you read/render the image file to a bitmap (say using the int_create_bitmap_from_file_GDIP function presented earlier), and then simply throw the bitmap at the screen it will appear somewhat weird if the bitmap needs to be reduced to fit on the screen.

In Windows techie talk: respond to the WM_DRAWITEM message by StretchBlting the source bitmap in to the device context supplied by the WM_DRAWITEM message.

// plain old C
    rCode   = StretchBlt
    ( destinationDeviceContext
    , 0
    , 0
    , 32
    , 32
    , sourceDeviceContext
    , 0
    , 0
    , widthOfSourceBitmap
    , heigthOfSourceBitmap
    , SRCCOPY
    )  ;

For the left hand image of the castle (above): this on the left (below) is what you (probably) wanted, but on the right is what you get if you simply throw the bitmap at the screen!

For the right hand image of the castle: this (below) on the left is what you (probably) wanted, but on the right is what you get if you simply throw the bitmap at the screen!

So What, Who Cares

Well, it turns out that I do, I think!

I eventually realised that the bitmap compression from a full-size image bitmap to the 32x32 bitmap needed for the DCT calculation, did not handle halftones.

This (below) is what the (without halftone handling) chain of code was comparing, revealing only a 64% similarity, i.e., their perceptual hash values differed by 23 bits out of 64.

When I used a halftone correction (if that is the correct phrase?) I had a much better 79% similarity, i.e., their perceptual hash values differed by 12 bits out of 64.

The castle image comparison with halftone correction is shown below:

With or Without Halftone Correction

There is a British saying that one Swallow does not make a summer (Swallow: a migratory bird: the "British" Swallow spends winter in Africa and returns to Britain in the summer).

So I tested numerous pairs of images with and without halftone correction. In general, it appears to me to be better to use halftone correction.

All one needs to do is add one line of code SetStretchBltMode immediately before the call to the StretchBlt that compresses the full-size image bitmap to the 32 by 32 pixel bitmap that is used in the Discrete Cosine Transformation.

Actually, Microsoft documentation says add two lines SetStretchBltMode and SetBrushOrgEx.

// plain old C
   SetStretchBltMode(destinationDeviceContext HALFTONE);   // correction for halftone 
                                                           // issue,  HALFTONE = 0x0004
    SetBrushOrgEx(destinationDeviceContext , 0, 0, NULL);  // recommended by Microsoft 
                                                           // documentation!
    rCode   = StretchBlt
    ( destinationDeviceContext
    , 0
    , 0
    , 32
    , 32
    , sourceDeviceContext
    , 0
    , 0
    , widthOfSourceBitmap
    , heigthOfSourceBitmap
    , SRCCOPY
    )  ;

Turning Things Around

Later on, in the 3 weeks that I spent developing the approach, I noticed that some near-identical images had widely different perceptual hash values, this was mainly where the images had been scanned with different orientations in the scanner. Scanned images do not have the correct value in the orientation tag (if they have such a tag at all), since they can not know the images' orientation.

Somehow, I had thought that the DCT algorithm would be orientation independent. Evidently, I thought wrong!

This means that image orientation is relevant to image comparison based on perceptual hashes ( at least as implemented in my code).

Just for fun/reference, I present the first castle image compared with a 90-degree clockwise version of itself, their perceptual hash values differed by 36 bits out of 64 so they have a similarity of 36%.

So, I decided to add in some code to do image orientation correction.

The gdi+ approach required two extra gdi+ methods (they are in a code snippet above):

GdipGetPropertyItem: to get the image orientation
GdipImageRotateFlip: to get the image "upright"

The OleLoadPicturePath and IPicture approach required a bit (English understatement!) more code.

If your programming environment supports it, make scripting calls to the GetDetailsOf API of the Shell.Application
or, if your programming environment supports it, make scripting calls to the Windows Image Acquisition
or use Windows Imaging Component to query the metadata properties embedded in a typical JPG file and thus get the orientation.
WIC is designed for COM interfaces, a minor issue in plain old C. Whatever, I coded this solution.

Then, once you have got the orientation, rotate the image as appropriate. The rotation code is a not difficult but it took me quite a while to get the logic correct in the lines marked ??? see below.

// plain old C
    uTYPE32 * srcPtr ,  *   srcDibData = pSrcCanvasHandle->data; // ptr to the pixel 
                                                                 // data in the source 
                                                                 // bitmap
    uTYPE32 * dstPtr ,  *   dstDibData = pCanvasHandleDst->data; // ptr to the pixel 
                                                                 // data in the 
                                                                 // destination bitmap
    int srcStride = pSrcCanvasHandle->stride / sizeof(uTYPE32) ; // distance between 
                                                                 // two consecutive 
                                                                 // source pixel rows, 
                                                                 // in 32bit steps
    int dstStride = pCanvasHandleDst->stride / sizeof(uTYPE32) ; // distance between 
                                                                 // two consecutive 
                                                                 // destination pixel 
                                                                 // rows, in 32bit steps
    int row, col 
    srcPtr = srcDibData ;
    dstPtr = dstDibData ;
    switch ( rotation )
    {
        case 0:
            for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
            {
                for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
                {
                  *dstPtr++ =  *srcPtr++  ; // ???  easy
                }
            }
        break ;
        case 90:
            for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
            {
                for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
                {
                  dstPtr [   (dstStride*(pSrcCanvasHandle->max_w - col-1)   )  
                             + (row)  ] =  *srcPtr++  ; // ???   
                }
            }
        break;
        case 180:
            for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
            {
                dstPtr = dstDibData + 
                (pSrcCanvasHandle->max_h - row -0)*dstStride    ; // ???  
                for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
                {
                  *--dstPtr =  *srcPtr++  ; // ???  
                }
            }
        break ;
        case 270:
            for ( row = 0 ; row < pSrcCanvasHandle->max_h ; row++ )
            {
                for ( col = 0 ; col < pSrcCanvasHandle->max_w ; col++ )
                {
                  dstPtr [   (dstStride*col )  + 
                  (pSrcCanvasHandle->max_h  - row  -1)  ] =  *srcPtr++  ; // ???  
                }
            }
        break;
        default:
        break; 
     }

Or and this is the clever bit. Forget the orientation correction approach using OleLoadPicturePath and IPicture, stay with the gdi+ approach, it is simply faster (and easier).

Converting the Compressed RGB Bitmap to Grayscale

One thing that always bugs me a bit is English or American English spelling. I am British by birth, with dual nationality (German). But frankly my hair is grey, not gray. Which is a natural colour (not color) for an aged being (I wrote my first program in 1971 in the CESIL programming language).

Grey or gray, whatever. The code I used for the grayscale conversion was:

// plain old C
    unsigned char   *   dibData = pReducedCanvasHandle->data;  // this is a ptr to the 
                                                               // pixel values in the 
                                                               // 32 by 32 compressed 
                                                               // bitmap
    unsigned char   *   dptr ;

    double grayArray [ 32 * 32 ]    ;
    double * dblPtr ;
    dblPtr = grayArray ;

    for ( row = 0 ; row <   32   ; row++ )
    {
        dptr = dibData + ( row * ( pReducedCanvasHandle->stride )  ) ;// this is a ptr 
                                                                      // to the 32 by 32
                                                                      // compressed 
                                                                      // bitmap STRIDE 
                                                                      // value
                                                                      // the distance 
                                                                      // in bytes between
                                                                      // 2 consecutive 
                                                                      // rows of pixel 
                                                                      // data
        for ( col = 0; col < 32   ; col ++ )
        {
            // the grayscale value is a weighted sum of the Red Green Blue components
            // why these weights? Do not ask me, they are related to "human perception".
            // And I did say it once or twice earlier on, 
            // do not ask me for explanations ;-)
            *dblPtr++ =    ( 0.072   * dptr[0] )  + 
                           ( 0.715   * dptr[1] )  + 
                           ( 0.213   * dptr[3] )  ;
            dptr += sizeof(uTYPE32) ;
        }     
    }

Discrete Cosine Transformation

I shall not attempt to explain the principles of DCTs, for one good reason. My digital signal processing knowledge has rusted over the last 42 years to the point of being worthless. Fifteen years ago I had a go at Discrete Fourier Transforms to create (just for fun) a waterfall display of audio signals, and tinkered with a Goertzel filter as part of a Morse-code decoder. But sorry, I cannot even start to explain any of it!

Using the DCT proved far easier than I expected. Code for the DCT is easily found online, so I took a copy and worked a bit on optimising it.

// plain old C
double dct   (double *DCTMatrix, double *ImgMatrix, int N, int M)
{
    // This is basically what one finds   online   
    // for a Discrete Cosine Transform algorithm in 2 dimensions.
    // Oh, not strictly true, I have included  code to find the 
    // average value of the DCTMatrix, excluding the value at DCTMatrix[0],
    // apparently  the value at DCTMatrix[0] is not helpful?
    // The  DCTMatrix  and   ImgMatrix    have the same size    N * M
    // This code calls the innermost loop  1048576   times when N = M = 32
    // On my laptop 1000 iterations (call to this function with image data)
    // took  12,706 ms   i.e. almost 13 seconds
    int i, j, u, v;
    int cnt = 0 ;
    double DCTsum  = 0.0 ;
    for (u = 0; u < N; ++u)
    {
        for (v = 0; v < M; ++v)
        {
            DCTMatrix[(u*N) + v] = 0;
            for (i = 0; i < N; i++)
            {
                for (j = 0; j < M; j++)
                {
                    DCTMatrix[(u*N) + v] += ImgMatrix[(i*N) + j] 
                    * cos(M_PI/((double)N)*(i+1./2.)*u)
                    * cos(M_PI/((double)M)*(j+1./2.)*v);
                    cnt=cnt++ ;
                }             
            }
            DCTsum +=  DCTMatrix[(u*N) + v] ;
        }
    }
    DCTsum -= DCTMatrix[0] ;   
    return   DCTsum /   cnt ;
}

I ran performance tests on this with image data, 1000 iterations took 12.7 seconds. Each image required that the DCT loop was executed 1048576 times (32*32*32*32).

There is significant room for improvement: both in understandability and performance. Of extreme relevance is that although the DCT is 32 by 32, we only need the 8 by 8 sub matrix (row 0:7, column 0:7). It is also possible to replace the expensive cos (cosine) calls by simple look-ups in to a predefined table.

After a few optimisation attempts, I had the following:

// plain old C
int CoeffFlag = 0 ;
double Coeff [32][32] ;

double dctOptimised (double *DCTMatrix, double *IMGMatrix )
{
/*
    Concerning: the expression
    dctRCvalue+= IMGMatrix[(imgRow * RowMax) + imgCol]    // this is dependant 
                                                          // on the IMAGE
    * cos( PIover32 *(imgRow+0.5)*dctRow)    //  this is a set of FIXED values
    * cos( PIover32 *(imgCol+0.5)*dctCol);   //  this is a set of FIXED values

    Let us call the  2 sets of FIXED values   rCoeff and cCoeff
    they both have the same set of values
    =  cos (  PIover32  * ( x + 0.5 ) * y )   for x = 0 .. 31    and y = for x = 0 .. 31
    = 32*32 distinct COSINE values
    = we could calculate these COSINE values in advance, 
      place them in an array, and (hopefully)
    speed things up by doing a simple look up .

*/
    #define  PIover32 0.09817477042
    #define  RowMax 32
    #define  ColMax 32    
    int imgRow, imgCol ;
    int dctRow, dctCol ;
    int x , y ;
    int cnt = 0 ;
    double DCTsum  = 0.0 ;
    double dctRCvalue = 0.0 ;

    if (!  CoeffFlag )
    {
        for (x = 0 ; x < 32 ; x++ )
        {
            for (y = 0 ; y < 32 ;y++)
            {
                Coeff[x][y] =  cos (  PIover32  * ( x + 0.5 ) * y ) ;
            }
        }
        CoeffFlag = 1;
    }

    for (dctRow = 0; dctRow < 8; dctRow++)
    {
        for (dctCol = 0; dctCol < 8; dctCol++)
        {
            dctRCvalue = 0;

            for (imgRow = 0; imgRow < RowMax; imgRow++)
            {
                for (imgCol = 0; imgCol < ColMax; imgCol++)
                {
                    dctRCvalue+= IMGMatrix[(imgRow * RowMax) + imgCol]
                    * Coeff[imgRow][dctRow]    //    cos( PIover32 *(imgRow+0.5)*dctRow)   
                    * Coeff[imgCol][dctCol] ;  //    cos( PIover32 *(imgCol+0.5)*dctCol) ; 
                    cnt=cnt++ ;
                }             
            }
            DCTMatrix[(dctRow*RowMax) + dctCol] = dctRCvalue;
            DCTsum +=  dctRCvalue;
        }
    }
    DCTsum -= DCTMatrix[0] ;
    return   DCTsum /   cnt ;
}

On my laptop, 1000 iterations of the optimised DCT took 71 ms i.e. 0.071 of a second. Compared to the original version, this is about 150 times faster. So 30,000 unoptimised DCTs would take about 390 seconds (5.5 minutes), as compared to 21 seconds with the optimised DCT. Each image required that the DCT loop was executed 65536 times (32*32*8*8).

So over 5 minutes saved for +30,000 images, not much, but every bit helps!

That was good news, but the real expense seems to be reading/rendering the image file to a bitmap. I have not made any measurements to verify this assumption.

Reading/Rendering Performance Boost from Direct2D

Well, there might be a performance boost using Direct2D. Maybe! I do not know (yet).

I have looked at using the Direct2D APIs, these are supposed to be fast. But the latest Microsoft Windows SDKs have the unfortunate characteristic that the Direct2D headers do not support plain old C. And it seems that some of the Direct2D interface methods are difficult to map to plain old C. Beats me how/why, but that's how it seems to be! The way to get around it is to write the code calling Direct2D in a C++ environment. It does not strictly need object orientation and classes, just COM.

I dread the thought of porting my code from plain old C to C++. Not least because I have in the >93,000 lines of plain old C source about 1500 function calls that will require attention before they will compile in C++.

Online searches reveal that you can mix plain old C and C++, but:

Calling functions across the C/C++ boundary requires extra care in the functional prototypes
The WinMain code should be implemented in C++ to ensure linking to the correct C++ libraries.

Well it is not necessarily so! At least not if you only need C++ and COM and not need object classes.

I wrote my Direct2D code in C++ and let the C++ code see the C header files for the rest of rtForth inside a simple construct:

C++

// C++
// this is the top few lines of the C++ code  
#include "f_d2d1.hpp"   // C++ header file for the Direct2D code
#include  < iostream>   // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
#include  < stdio.h>    // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
#include  < stdlib.h>   // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
#include  < string.h>   // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
#include  < Windows.h>  // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
#include  < d2d1.h>     // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue

#include < wincodec.h>  // SORRY, remove  the space after the   less-than   sign. 
                        // Seems to be a HTML issue
extern "C" { 
#include "f_protos.h"   // header file for those plain  "C" functions that 
                        // the Direct2D code might call
                        // no alteration to this header where needed 
}
#pragma comment(lib, "d2d1.lib")
#pragma comment(lib, "Windowscodecs.lib")

Write the functionality that calls Direct2D in a C++ file. Then when a plain old C function needs to call a function in the C++ file code you will need to include a header file to the plain old C source that contains functional prototypes for the required C++ functions. So, for example

// plain old C
#include "f_d2d1.hpp"  // "C" header file for "C" source files, 
                       // the header references the needed "C++" functions.

In the f_d2d1.hpp file, the "C++" functional prototypes are provide in the following way:

// plain old C
#pragma once
#ifdef __cplusplus
extern "C" {
#endif
	void  fth_d2d1_main();  // functional prototype of a C++ function 
                            // that is callable from plain old "C"
#ifdef __cplusplus
}
#endif

Finishing It Off

To be cheeky, I could finish after the following code snippet. But there are a few follow on issues.

// plain old C
int64_t hash_matrix ( double * matrixPtr,  double averageValue )
{
  int row, col ;
  int64_t phash ;
  for ( row = 0 ; row < 8 ; row++ )
  {
    for ( col = 0; col < 8   ; col++ )
    {
      if ( averageValue <   matrixPtr[(row*32) + col ] )
        phash = (phash<<1) + 1;
      else
        phash = phash<<1;
    } 
  }
  return phash ;
}

// and THEN, in some function that manages the steps, 
// I have the following to finalise the perceptual hash generation

    double avg8x8 ;
    double * DCTMatrix   [32 * 32 ] ;  // matrix that   computeDTC 
                                       // will fill with DCT calculated values
    avg8x8 = computeDTC ( DCTMatrix,  grayArray , 32, 32 ) ;  // calculate the DCT 
                                                              // from the grayscale 
                                                              // bitmap
    return  hash_matrix ( DCTMatrix, avg8x8 )  ) ; // convert the DCT matrix to 
                                                   // a 64 bit set of 1s and 0s, 
                                                   // the perceptual hash

Follow on Issues

Define Similar

If you want to get an indication of how similar two images are, then get their 64 bit perceptual hash values, do an exclusive bitwise or of the two values, then simply count the number of bits set in the XOR result (called the Hamming distance). My experience (admittedly limited to the last few weeks) is that a value of 12 or less is indicative of similarity! (As shown in the castle comparison earlier in this article.)

If you search for caltech 101 images dataset, you should find a zip file with a set of some 9000 (small) images with varying degrees of similarity. Here is a link to it: Caltech 101 images dataset I downloaded their images and wrote a pwBasic script (less than 200 lines) so that I could drag-n-drop images on to one of two regions on the screen and then create the perceptual hash for the two images. Display the perceptual hashes as 8x8 matrix, show the Hamming distance between the two images and convert that Hamming distance to a percentage similarity.

What made me a bit sad, is that I have not found online a set of reference images complete with their perceptual hash values. I am just unsure enough of my own work to want to check it against a recognised reference. Possibly, I have misunderstood what the Caltech 101 images dataset offers since in addition to the images the zip contains information in MATLAB files which are meaningless to me.

Consider using a Burkhard Keller Tree

Even if you have your images referenced, with their perceptual hash value, in a database, the problems are not over. An off-the-shelf RDBMS will (probably) not be too performant if you ask it to find, for each of the 30,000+ entries its nearest neighbour.

So, I implemented a Burkhard Keller tree extension for the SQLite database that I use in pwBasic. Even through the code for inserting values in to the BK-tree was not large (~60 lines of plain old C code), and finding the closest neighours was ~80 lines, it was a real pain to get the correct logic. I might make an article of it (I think I can even explain how/why BK-trees work).

But the reward was worth the frustration of implementing the BK-tree.

Performance: Perceptual Hash and BK-tree

Scanning the 30000+ images, a total of ~82 GBytes, to create a database with filepath/names and a perceptual hash value per file took 61 minutes (my laptop is a 6 year old i7-7600U CPU @ 2.80GHz, with the code written without threads). Then, a call to the SQLite Burkhard Keller tree extension to get the closest neighbour for each/all of the images took another 30 seconds.

Purely as an observation. My son has a brand new (1 week old) laptop with an AMD Ryzen Pro 7 CPU, we copied (no installation needed) the image files and pwBasic executable and script to his laptop and reran the tests. It must be emphasised that pwBasic was written for a single core, i.e., single thread, so we did not expect too much processing performance improvement, both laptops have NVME solid state disks so we did not expect any disk access advantages. My son's laptop took 44 minutes (mine took 61 minutes). My son has since then convinced me to take a serious look at spawning some threads to spread out the image rendering load across CPU cores. (Sounds a bit modern to me.)

Next Steps

I am integrating use of the perceptual hash code and Burkhard Keller tree code in to my pwBasic metadata management system script, the plain old C code is already in rtForth and wrapped by pwBasic, I just need to decide how best to use it in the script.

Later on when the pwBasic metadata management system is truly stable, I will try out Direct2D and C++, maybe even port other bits of my plain old C code into C++. Whilst I find the current performance acceptable for the perceptual hash calculation, I would like a more performant image rendering for a planned metadata management system enhancement.

Oh yeah, I shall think about multi-threading and if/how it can fit into rtForth / pwBasic but I have doubts.

I will place the SQLite Burkhard Keller tree extension source code in the public domain, initially on my web site (https://www.ripetech.com) or (https://www.plodware.com technically a subdomain of RipeTech)

When I am happy, I will release the pwBasic metadata management system on my web site.

The Prologue

As the author, Douglas Adams wrote in his wonderful work HHGTTG, the Sirius Cybernetics Corporation Complaints Division had the following highly appropriate (?) motto.

share and enjoy

I would like to quote an old Chinese proverb: better to light a candle than to curse the darkness

I hope my candle, this article, gives a little light.

In aller Demut.
GeorgeDS

History

4^th June, 2023: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)