Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C

Implementing Principal Component Analysis Image Segmentation

5.00/5 (5 votes)
21 Jun 2024CPOL3 min read 4.3K   93  
How to implement principal component analysis clustering algorithm to perform image segmentation.
We can bisect an image into two areas of similar colour by taking the principal component of variation and then using Otsu thresholding to select the point at which to cut. Further bisections of clusters with the highest error are then bisected until a segmentation of the desired number of segments is produced. This algorithm is invented by myself and so is not well tested.

Introduction

It's often necessary to segment images into areas of similar colour. This can be used as a low level step in AI and image recognition, but the obvious immediate application is to choose a palette of colours to convert a 24-bit rgb truecolour image into a 256 colour paletted .gif.

Background

The standard algorithm is k means clustering, with is a general method for finding clusters of data given a arbitrary distance function. Principal component analysis is a new method that uses the information that images have only three components of variation - red, green, blue, or equivalents in other spaces. The principal component of variance is the axis in RGB (for the sake of argument) colour space which is the longest if we cut by the first and last data point to like along it. And by taking that component, which we do by finding the eigenvectors, effectively we have a system which is colour space neutral. To find two clusters, we cut along a plane prependicular to that principal component. And the way to find the place to cut is not to take the average, but to use Otsu thresholding to find where the most natural division into two types lies. Having achieved a bisection, it's very simple to perform further bisections.

Using the code

Eigenvectors are a property of a matrix such as that when the eigenvectors are mutlplied by the matrix, the eigenvectors remain the same, apart from scaling. And conventionally the eigenvectors are normalised, whilst the scaling is called the eigenvalues. And they have many mathematical properties. But the one we are interested in is that when the matrix is a matrix of the covariances of a list of series of equal length, then the eigenvectors represent the major axes of variance, as given by the eigenvalue. So the biggest eigenvector in RGB space is the one along which to cut.

And since pixels only have three components, our matrix is only a 3x3, which means that we can use special and very efficient code to find the eigenvectors.

 

C++
/* Eigen-decomposition for symmetric 3x3 real matrices.
   Public domain, copied from the public domain Java library JAMA. */

#ifndef _eig_h

/* Symmetric matrix A => eigenvectors in columns of V, corresponding
   eigenvalues in d. */
void eigen_decomposition(double A[3][3], double V[3][3], double d[3]);

#endif

Now we've got a way of getting eigenvectors, we can do principal component analysis for pixels in rgb space.

C++
/*
   Get principal components of variance
   Params: ret - return for components of major axis of variance
           pixels - the pixels
           N - count of pixels
 */
static int pca(double *ret, unsigned char *pixels, int N)
{
    double cov[3][3];
    double mu[3];
    int i, j, k;
    double var;
    double d[3];
    double v[3][3];
    
    for(i=0;i<3;i++)
        mu[i] = meancolour(pixels, N, i);
    
    /* calculate 3x3 channel covariance matrix */
    for(i=0;i<3;i++)
        for(j=0;j<=i;j++)
        {
            var  =0;
            for(k=0;k<N;k++)
            {
                var += (pixels[k*3+i] - mu[i])*(pixels[k*3+j] - mu[j]);
            }
            cov[i][j] = var / N;
            cov[j][i] = var / N;
        }
    eigen_decomposition(cov, v, d);
    /* main component in col 3 of eigenvector matrix */
    ret[0] = v[0][2];
    ret[1] = v[1][2];
    ret[2] = v[2][2];
    
    return 0;
    
}

That should be all pretty straightforwards, just set up the calls to eigen_decomposition right.

 

And the the guts of it. First the PALENTRY structure.

C++
    typedef struct
{
    int start;
    int N;
    double error;
    unsigned char red;
    unsigned char green;
    unsigned char blue;
} PALENTRY;

start is an index into an array of pixels and N is the count. Otherwise nothing surprising here. We can use any colour distance function to calculate the error.

And now we actually split the entry.

C++
    /*
   split an entry using pca and Otsu thresholding.
   We find the pricipal component of variance in rgb space.
   Then we apply Itsu thresholding along that axis, and cut.
   We partition using one pass of quick sort.
 */
static void splitpca(PALENTRY *entry, PALENTRY *split, unsigned char *buff)
{
    int low, high;
    int cut;
    double comp[3];
    unsigned char temp;
    int i;
    
    pca(comp, buff + entry->start*3, entry->N);
    cut = getOtsuthreshold2(buff+entry->start*3, entry->N, comp);
    low = 0;
    high = entry->N -1;
    while(low < high)
    {
        while(low < high && project(&buff[((entry->start+low)*3)], comp) < cut)
            low++;
        while(low < high && project(&buff[((entry->start+high)*3)], comp) >= cut)
            high--;
        if(low <= high)
        {
            for(i=0;i < 3;i++)
            {
              temp = buff[(entry->start+low)*3+i];
              buff[(entry->start+low)*3+i] = buff[(entry->start+high)*3+i];
              buff[(entry->start+high)*3+i] = temp;
            }
        }
        low++;
        high--;
    }
    split->start = entry->start + low;
    split->N = entry->N -low;
    entry->N = low;
    calcerror(entry, buff);
    calcerror(split, buff);
    
}

And finally the Otsu thresholding.

C++
    /*
 get the Otusu threshold for image segmentation
 Params: gray - the grayscale image
 width - image width
 height - uimage height
 Returns: threshold at which to split pixels into foreground and
 background.
 */
static int getOtsuthreshold2(unsigned char *rgb, int N, double *remap)
{
    int hist[1024] = {0};
    int wB = 0;
    int wF;
    float mB, mF;
    float sum = 0;
    float sumB = 0;
    float varBetween;
    float varMax = 0.0f;
    int answer = 0;
    int i;
    int k;
    
    for(i=0;i<N;i++)
    {
        int nc = rgb[i*3] * remap[0] + rgb[i*3+1] * remap[1] + rgb[i*3+2] * remap[2];
        hist[512+nc]++;
    }
    
    /* sum of all (for means) */
    for (k=0 ; k<1024 ; k++)
        sum += k * hist[k];
    
    for(k=0;k<1024;k++)
    {
        wB += hist[k];
        if (wB == 0)
            continue;
        
        wF = N - wB;
        if (wF == 0)
            break;
        
        sumB += (float) (k * hist[k]);
        
        mB = sumB / wB;            /* Mean Background */
        mF = (sum - sumB) / wF;    /* Mean Foreground */
        
        /* Calculate Between Class Variance */
        varBetween = (float)wB * (float)wF * (mB - mF) * (mB - mF);
        
        /* Check if new maximum found */
        if (varBetween > varMax)
        {
            varMax = varBetween;
            answer = k;
        }
        
    }
    return answer - 512;
}

We have to tweak the standard Otsu function slighty because we want to cut in eigen-space, not along luminance as we do normally. So we pass a remap parameter.  

Results

The standard test images are avialable from the University of Southern California.

https://sipi.usc.edu/database/database.php?volume=misc&image=11#top

 

Original 256 colour
mandrill 24 bit madrill 256 color
airplane 24 bit airplane 256 colour

And it's a bit unfair to compare results to other algorithms. A well established algorithm will be tweaked and refined many times, and might produce better results despite the core underlying method being weaker. But thse results are pretty nice, and certainly usable. 

Points of Interest

Palette choosing seemed to die the death when displays went to 24 bits. But now it's had a bit of a revival with the return of the animated gif, and the need for realistic movies.

History

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)