Linear Rank Deficient Filters in Computer Vision Applications

Chesnokov Yuriy

4.84/5 (17 votes)

1 Jun 2008GPL34 min read

1.4K

The article describes the optimization of an image convolution operation with the application of linear rank deficient filters in computer vision problems.

Introduction

In computer vision, you very often have to perform an image convolution operation with some filter. Examples are image enhancements, edge detection, Gaussian smoothing, or object detection using projection of the image into a low dimensional space represented with PCA, LDA, ICA, MP, or any other set of vectors. Having a bigger sized image makes the operation more time consuming. Recently, I found a great article, Face Detection - Efficient and Rank Deficient, where the authors apply a method to the rank deficient approximation of the support vector machine (SVM) classifier. With rank deficient approximation, the ordinary convolution taking O(w*h) filtering operations can perform as fast as O(w+h) operations only, where w and h are sizes of the image. So, I decided to code the method for my own computer vision proceedings and see the performance.

Background

In this article, the vec2D class is extended from my 2D Vector Class Wrapper SSE Optimized for Math Operations article. If you are not familiar with convolution operations, have a look at the Wiki.

Algorithm

Having some linear filter H and applying it to an image I results in a 2D convolution operation:

J = I * H;

So, J is the same size as I.

If H has size h x w, the convolution requires O(h*w) operations for each output pixel. However, in special cases, where H can be decomposed into two column vectors a and b, such that:

H = ab';

the convolution becomes:

J = [I * a] * b';

Since the convolution is associative, and in this case, ab' = a * b'. This splits the original convolution operation into two operations with vectors of size h x 1 and 1 x w, respectively. As a result, if a linear filter is separable (H = ab'), the computational complexity of the filtering operation drops from O(h*w) to O(h + w) per pixel.

The singular value decomposition, SVD, is used to decompose the H filter to a and b vectors.

H = USV';

U and V are orthogonal matrices of size h x h and w x w. S is a diagonal (the diagonal entries are the singular values) and has size h x w. Let r = rank(H). Due to rank(S) = rank(H), we may write H as a sum of r rank one matrices.

formula (1)

Where s denotes the i^th singular value of H, and u, v are the i^th columns of U and V (i.e., the i^th singular vectors of the matrix H), respectively. As a result, the corresponding linear filter can be evaluated as the weighted sum of r separable convolutions:

(2)

And the computational complexity drops from O(h * w) to O(r * (h + w)) per output pixel. The speed benefit depends on r, which can be seen to measure the structural complexity of H.

You can choose r until the reconstruction error for the H filter in (1) decreases below some threshold. Apparently, the more complex the filter (i.e., the more high frequency components it contains), a bigger r is needed, but simple filters as Gaussian can be approximated with great precision, for r = 1.

Below some Eigen face is presented:

And its reconstruction after 1, 2, and 3 iterations (i.e., r = 1, 2, and 3):

As can be seen, the MSE decreases as r increases.

Using the code

Below, an ordinary convolution and a rank deficient one are presented:

C++

//this = a * filter;
void vec2D::conv2(const vec2D& a, const vec2D& filter, 
                  enum CONV type)  //filter should be of odd size
{
    int l = filter.height() / 2;
    int m = filter.width() / 2;        

    vec2D& c = *this;

    if (type == SAME) {      //this size = a size        
        for (int i = 0; i < (int)c.height(); i++) {
            for (int j = 0; j < (int)c.width(); j++) {
                float tmp = 0;                        
                for (int u = i - l; u <= i + l; u++)
                    for (int v = j - m; v <= j + m; v++) 
                        tmp += filter((i-u)+l, (j-v)+m) * a.get(u, v);
                c(i, j) = tmp;
            }
        }
    }
    else if (type == VALID) { //this size = (a size - filter size) + 1        
        for (int i = l; i < (int)a.height() - l; i++) {
            for (int j = m; j < (int)a.width() - m; j++) {
                float tmp = 0;                        
                for (int u = i - l; u <= i + l; u++)
                    for (int v = j - m; v <= j + m; v++) 
                        tmp += filter((i-u)+l, (j-v)+m) * a(u, v);
                c(i - l, j - m) = tmp;
            }
        }
    }
}

With enum vec2D::CONV, you can choose a Matlab like style to perform the convolution when the output array equals the size of the convolved one, or convolve only the region of the image where the filter fits without zero-padding.

C++

//this = sum( s[i] [a * u[i]] * v[i])
void vec2D::conv2(const vec2D& a, const std::vector<vec2D>& u,  //col vectors
                  const std::vector<vec2D>& v,  //row vectors
                  const vec2D& s,   //row vector singular values
                  vec2D& tmp1,  //tmp1 = [(a.height - u.height + 1) x a.width]
                  vec2D& tmp2,  //tmp2 = this = [(a.height - u.height + 1)
                                    //                x a.width - v.width + 1]
                  enum CONV type)
{                                   
    vec2D& c = *this;
    c.set(0.0);
    for (unsigned int i = 0; i < s.width(); i++) {
        tmp1.conv2(a, u[i], type);
        tmp2.conv2(tmp1, v[i], type);
        tmp2.mul(s(0, i));
        c.add(c, tmp2);
    }
}

With the rank deficient function, you need to provide u and v vectors in std::vector arrays, s is the row vector of singular values, and tmp1, tmp2 are temporary buffers. I commented out their sizes for VALID convolution.

If you're going to perform your image projection onto some PCA, LDA, ICA, or the like basis, remember to rotate your filter before SVD, or rotate u, v vectors before calculating the convolution.

C++

void vec2D::fliplr()
{
    vec2D& c = *this;
    float tmp;

    for (unsigned int i = 0; i < width() / 2; i++) {
        for (int y = yfirst(); y <= ylast(); y++) {
            tmp = c(y, xfirst() + i);
            c(y, xfirst() + i) = c(y, xlast() - i);
            c(y, xlast() - i) = tmp;
        }
    }
}
void vec2D::flipud()
{
    vec2D& c = *this;
    float tmp;

    for (unsigned int i = 0; i < height() / 2; i++) {
        for (int x = xfirst(); x <= xlast(); x++) {
            tmp = c(yfirst() + i, x);
            c(yfirst() + i, x) = c(ylast() - i, x);
            c(ylast() - i, x) = tmp;
        }
    }
}
void vec2D::rotate180()
{
    vec2D& c = *this;
    c.fliplr();
    c.flipud();
}

Results

The convolution results for a 192x256 array with a 19x19 filter are presented bellow. First is the original image to convolve with Eigen face:

The rank deficient convolution for r = 1, 2, 3 and the ordinary convolution of that image are presented below:

As can be seen, as r increases, the MSE between ordinary convolution results and the rank deficient one decreases.

And, the timing of the convolutions follows:

SAME
 J = I * H    processing time: 142.258583 ms

 J = sum (s[i] [I * u[i]] * v[i]'); i < 1    processing time: 20.521044 ms
 J = sum (s[i] [I * u[i]] * v[i]'); i < 2    processing time: 42.043612 ms
 J = sum (s[i] [I * u[i]] * v[i]'); i < 3    processing time: 63.772351 ms

VALID
 J = I * H    processing time: 88.931973 ms

 J = sum (s[i] [I * u[i]] * v[i]'); i < 1    processing time: 14.583697 ms
 J = sum (s[i] [I * u[i]] * v[i]'); i < 2    processing time: 28.884118 ms
 J = sum (s[i] [I * u[i]] * v[i]'); i < 3    processing time: 42.637542 ms

For the linear rank deficient filters convolution, the results are presented for 1, 2, and 3 iterations.

The SAME mode gives you the resultant J array of the same size as I, applicable for image filtering (e.g., Gaussian smoothing, edge detection, image enhancements). The VALID is the convolution where the filter is entirely fit into the J array without zero-padding, so the resultant J array is smaller to the original I. The latter is applicable for image projection onto some basis (e.g., PCA, LDA, ICA, MP, Gabor filters etc...) when you need to analyze the regions in the image where the filter is fitted without zero-padding (e.g., face, car, or any other object detection in computer vision tasks).

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)