Introduction
The iterative solution to the connected component labelling algorithm is well described in the literature, but requires quite complex methods when implemented. The simpler recursive solution has the problem of using more stack than usually available, even for small images.
This article presents the recursive 4-connected component labelling algorithm with a workaround for the stack limitation. All in less than 70 lines of C/C++ code. The implementation is written in pure C, so it can be used in any C/C++ project no matter what the platform and compiler are.
Background
The connected component labelling is often used in the fields of computer vision and image analysis.
Using the Code
The code consists of a single source file. To use it, simply include the code into your C/C++ project and call the function LabelImage
.
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output);
Input image is an array of bytes with 0 values being background and other values (typically 1 or 255) indicating an object. It is often found by thresholding.
Output image (the labelled image) is an array of integers with 0 values being background and label numbers starting with 1 up to the number of labels found.
Output image must be allocated in advance and initialized to zero.
Limitations
Sizes of input and output images are limited to a maximum of width x height = (65535 x 65535 pixels) due to unsigned short typically being 16 bit.
Sizes of input and output images are also limited by the available heap size (about "6*width*height" bytes heap memory is allocated temporarily, which is in the order of 1.5 times the size of the output image).
Example of how to use is shown below:
unsigned short W = 640;
unsigned short H = 480;
unsigned char* input = (unsigned char*) malloc(W*H*sizeof(unsigned char));
int* output = (int*) malloc(W*H*sizeof(int));
memset(output, 0, W*H*sizeof(int));
LabelImage(W, H, input, output);
The data of the input and output images is layed out directly as continuous IPL images or matrices in OpenCV (which I think is a standard way of distributing rows and columns of data in images). This makes it very easy to use OpenCV together with the algorithm:
Example of how to use in an OpenCV project is as shown below:
unsigned short W = 640;
unsigned short H = 480;
Mat input (H, W, CV_8UC1);
Mat output(H, W, CV_32SC1, 0);
LabelImage(W, H, (unsigned char*)input.data, (int*)output.data);
To initialize the input image in the above examples, the distribution of rows and columns must be known as stated previously.
It is exemplified by the following simple demonstration, a dummy image with (width, height) = (8,6) pixels:
- Rows are numbered 0 .. 5
- Columns are numbered 0 .. 7
- Shown pixel value of 255 shown at index = column + row*width = 6 + 2*8 = 22, i.e. input[22] = 255.
Implementation
As the method is a modification of the standard recursive algorithm, let's first show how this look like.
The standard 4-connected component recursive algorithm written in C/C++ (recursive procedure LabelComponent
comes in the next code snippet):
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output)
{
int labelNo = 0;
int index = -1;
for (unsigned short y = 0; y < height; y++)
{
for (unsigned short x = 0; x < width; x++)
{
index++;
if (input [index] == 0) continue;
if (output[index] != 0) continue;
labelNo++;
LabelComponent(width, height, input, output, labelNo, x, y);
}
}
}
The recursive part comes here in the procedure LabelComponent
:
void LabelComponent(unsigned short width, unsigned short height,
unsigned char* input, int* output, int labelNo, unsigned short x, unsigned short y)
{
int index = x + width*y;
if (input [index] == 0) return;
if (output[index] != 0) return;
output[index] = labelNo;
if (x > 0) LabelComponent(width, height, input, output, labelNo, x-1, y);
if (x < width-1) LabelComponent(width, height, input, output, labelNo, x+1, y);
if (y > 0) LabelComponent(width, height, input, output, labelNo, x, y-1);
if (y < height-1) LabelComponent(width, height, input, output, labelNo, x, y+1);
}
Now we only need to rewrite procedure LabelComponent
to avoid stack overflow. But to do this, first we must understand what happens when a procedure is called:
- Calling a procedure consists of putting the procedure parameters onto a stack together with the program address just after the procedure. Then jump to the procedure address (see macro
CALL_LabelComponent
below). All parameter variables in the procedure then refer to their actual stack position. - When the procedure ends (the return points in the procedure - see macro
RETURN
below), pop the values of the stack and jump to the address just after the procedure (which also was put onto the stack).
The whole program has been re-written with own implemented stack (the four macro names CALL_LabelComponent
, RETURN
, X
, and Y
should increase readability):
CALL_LabelComponent
corresponds to a recursive call of the LabelComponent
procedure RETURN
indicates the end of a call to the LabelComponent
procedure X
and Y
(upper case) resemble the local variables x
and y
(lower case)
#define CALL_LabelComponent(x,y,returnLabel) { STACK[SP] = x;
STACK[SP+1] = y; STACK[SP+2] = returnLabel; SP += 3; goto START; }
#define RETURN { SP -= 3; \
switch (STACK[SP+2]) \
{ \
case 1 : goto RETURN1; \
case 2 : goto RETURN2; \
case 3 : goto RETURN3; \
case 4 : goto RETURN4; \
default: return; \
} \
}
#define X (STACK[SP-3])
#define Y (STACK[SP-2])
void LabelComponent(unsigned short* STACK, unsigned short width, unsigned short height,
unsigned char* input, int* output, int labelNo, unsigned short x, unsigned short y)
{
STACK[0] = x;
STACK[1] = y;
STACK[2] = 0;
int SP = 3;
int index;
START:
index = X + width*Y;
if (input [index] == 0) RETURN;
if (output[index] != 0) RETURN;
output[index] = labelNo;
if (X > 0) CALL_LabelComponent(X-1, Y, 1);
RETURN1:
if (X < width-1) CALL_LabelComponent(X+1, Y, 2);
RETURN2:
if (Y > 0) CALL_LabelComponent(X, Y-1, 3);
RETURN3:
if (Y < height-1) CALL_LabelComponent(X, Y+1, 4);
RETURN4:
RETURN;
}
void LabelImage(unsigned short width, unsigned short height, unsigned char* input, int* output)
{
unsigned short* STACK = (unsigned short*) malloc(3*sizeof(unsigned short)*(width*height + 1));
int labelNo = 0;
int index = -1;
for (unsigned short y = 0; y < height; y++)
{
for (unsigned short x = 0; x < width; x++)
{
index++;
if (input [index] == 0) continue;
if (output[index] != 0) continue;
labelNo++;
LabelComponent(STACK, width, height, input, output, labelNo, x, y);
}
}
free(STACK);
}
Points of Interest
The described implementation is so simple, that it is easy to customize to your own need, for example like:
- Change it to find the 8-connected components.
- Change data type of input or output image to your needs.
- By just modifying the 2 lines of code that test for the input image pixel being zero with a test against a threshold, you will save running through the image for doing a fixed threshold first.
Apart from being an easy to use and easy to customize algorithm solving the connected component labelling problem, it also presents a method to implement a recursive algorithm that has no easy iterative solution, even though here it would result in stack overflow. The drawback is that the method is not generally applicable, only in the cases where the needed stack size is within what normally can be allocated at the heap.
The CALL
and RETURN
macros in the implementation can be made smarter. Giving up on the cross platform and ANSI C compliance, the GNU GCC compiler allows using labels as values (the && operator extension). Thus the switch statement in the RETURN
macro can be avoided, but at the cost of using pointer sized entries in the stack.
History
- 2014-10-01 First version of article