We will find out how fast these five flood fills really are.
Introduction
A couple of years ago, I wrote a floodfill program here. Later, I found that it is not working right on all possible shapes to be filled. Now I am back with an improved version, Border Trace. And yet another algorithm, CrissCross
.
I will compare my floodfill code with some other algorithms that claim to perform fast floodfill. The three that I will compare my methods to are:
The Five Flood Fill Algorithms
The three competitors to my code share a basic idea: going from left to right to fill source-color pixels with the destination color, while checking up and down of each pixel if there are source-colored pixels there; these are pushed to a stack and processed later. There are different optimisations in each method: because basically a pixel may need checking multiple times for its color. The Quickfill algorithm checks the appearance of already processed ones in a second "visited" stack. The Linear Queue has an extra array that marks pixels as checked. The Efficient Fill relies solely on checking for the source color. That is efficient indeed, in the case of a plain fill where just 1 color is replaced by another. But there is another use for flood fill too, when the source color is replaced by an image. Because in that case, the image that replaces the source color can contain that same color, Efficient Fill cannot perform the image fill.
My own flood fill BorderTrace
marks the borders of the source color area, that is a different approach. Originally, I used the destination color to mark the border. But that will make the flood fill fail in some cases. What I do now is push each top pixel of the border on a stack, and mark each bottom pixel in a second 2D array. Then pop all top pixels and fill a line going down until the marked bottom. The tricky part is the "islands" inside the source color area that contain another color and must not be filled. The islands are encountered while filling and their borders are marked the same way. Difference is that those borders now partly may have the destination color. This made my algorithm unsuited for filling with an image. My workaround: I first Trace all the border, including island borders, and start filling once all borders are found. This means I visit the stack twice.
And my algorithm CrissCross
is a variation on the first three. Different is that it fills lines in four directions instead of two: up, down, left and right.
Adjustments
To benchmark the algorithms, I made a few adjustments to each of the three competitors so that the basic circumstances are equal. I removed recursion from Efficient Fill, I use a stack instead. And for image fill, I use a second array. I completely rewrote QuickFill
, implementing what I think is the basic idea behind it, skipping all unnecessary extras. One thing I skipped is the sorting of the stack. Another is the second "visited" stack where all the popped entries go to and that is searched for doubles on every push. It is only needed for image fill, and for that, I now use a 2D array to mark the visited pixels: much more efficient. That is the same technique that I use for all except for border tracing. And Linear does not need an extra "checked" array in case of color replacement, so I use that only for the image replacement version.
Tricks to Speed Up
- I directly address the stack that all algorithms need. Instead of popping, I loop through the stack. Because the visited entries are not removed, the stacksize needs to be sufficient: image width * height * entries per "push" is more than enough. For readability or a lesser
stacksize
, one can refer to the versions with pushing and popping. It means a slightly lesser performance. - When benchmarking, I use the auxiliary 2D array multiple times, so it has to be initialized to 0 every time. A fast way is to use
std::fill
but that is for 1D arrays only. So I assign a 1D pointer sB6 to the 2D array sB5 and fill that pointer:
byte sB5[WI][HI], *sB6 = &sB5[0][0];
- I write directly to locked bitmap data. Usually, that is a 24-bits bitmap. The data is 3 bytes per pixel and eventually some padding bytes at the end of each row if the image width is not a multiple of 4. The padding means that "every 3 bytes a pixel" is valid per row, but not overall. I do use a
struct
"color
" with 3 bytes: a pointer for each row is the solution, in an array of pointers of "color
". So I can address every pixel with tB[y][x]
(not [x][y]
!).
Now I have to compare the pixel color with the source color. A function "Same(color1, color2)
" does that. But it is a bit slow. It is faster is to compare integers. So from my data array "tB
", I read integers instead of color struct
s. This means I need an extra byte. I just use the first byte from the next pixel and then neglect that one by doing this:
int *(&tB[y][x]) && 0xFFFFFF
So: the color array gives the location of the first byte, but I read 4 bytes instead of 3. This is much faster! But there is a catch: if there are no padding bytes because the images width is a multiple of 4, I am reading 1 byte beyond the bitmap data! I found a solution for that as well: I copy the last row of bitmap data, one byte added, and the pointer to the copy goes into my array tB
, the last row. And after the algorithm is done, I copy it back to the bitmap data.
Byte* LastLine = new Byte[s1 + 1];
memcpy(LastLine, tB[hi1 - 1], s1);
color* ktB = tB[hi1 - 1];
tB[hi1 - 1] = (color*)LastLine;
TestOne();
memcpy(ktB, LastLine, s1);
delete[] LastLine;
I admit this is a bit of a quick 'n dirty solution. It does have quite an impact on performance but use it with care! Of course, if one has 32 bits color data, all of this is not needed, one simply compares integers.
The Benchmark
I tested 10 algorithms: 5 different ideas, and all in 2 versions, one for color replacing, one for filling with an image. The latter, as explained above, usually needs an extra array to keep track of what was visited already. We can't rely on the source color alone, because it may be in the image as well. I also tested all with some areas of varying complexity. The actual times are, of course, related to my CPU. But they give an impression of which is faster. I give the times, in microseconds, for 1 single run. But actually, I ran them 10000 times to get an average. The minimum and maximum times I encountered are given. The time needed to read the bitmap data is not included. For image fill, I use a single-color bitmap, just to run the test repeatedly. In real use, any image will do. (I did test all before with a multicolored image). And for this test, that image fill bitmap is the same size as the original bitmap, to keep it simple.
Test Results
Image 1 : Simple area
A : color replacement
193-301 Border Trace
213-217 CrissCross
217-284 Efficient
221-242 Quickfill
420-543 Linear
B : image fill
250-349 Border Trace
270-275 CrissCross
302-464 Efficient
308-324 Quickfill
519-583 Linear
Image 2 : Complex with lots of islands
A : color replacement
515=771 Border Trace
601-611 CrissCross
616-770 Quickfill
800-1021 Efficient
1010-1274 Linear
B : image fill
637-811 Border Trace
743-752 CrissCross
799-893 Quickfill
1044-1081 Efficient
1297-1376 Linear
Image 3 : Complex area that includes the edges
A : color replacement
440-514 Border Trace
466-473 CrissCross
490-544 Quickfill
638-737 Efficient
653-680 Linear
B : image fill
455-463 Border Trace
530-540 CrissCross
576-591 Quickfill
799-838 Efficient
863-930 Linear
Conclusion
Border Trace performs best - at times. There is a huge gap between the minimum and maximum, in all cases. Some people do not like border trace. I think this one is different, it does not color the borders. True, it is a bit of a complex algorithm.
Second best is CrissCross
. It is not very difficult to grasp. It has a very steady performance with little difference between minimum and maximum. Meaning that often, but not always, it will come out as first in the test. This depends more on whether Border Trace had a good run. For my next project, I will choose CrissCross
.
Quickfill
had good reviews ever since it was published, and it performs quite well. I have seen some spin-offs. The original version needed a make-over.
Efficient, as I named it, is not so good either but I like the idea. And it performs better than Linear Queue.
Linear Queue is the most elegant coding but does not perform that well. Seems it is used quite a lot.
Notes
The algorithms can be adapted to replace colors within a range, instead of a single color. In that case, the function "Same
" can be adjusted. This is not included in the benchmark.
Originally, I included versions that don't do edge checking, in case the area to be filled does not touch any edges. This gives a slightly faster run. I thought about adding 1 pixel on every side to create this effect. But since I write directly to the bitmap data, I skipped the idea.
I included the not-so-optimized versions in the download, and those have comments as well to explain details. They give a more readable representation of the algorithms.
New Update
I made some adjustments to CrissCross
, Border Trace
and QuickFill
. But the best news is that I got inspired by the question about stack use. I developed yet another algorithm. I call it FillFall because it reminds me of a waterfall, it cascades down (and up), getting wider and wider. It needs a lot less stack than the five others. And it is faster! The total stack use is divided in 3 parts. Two of them are integer arrays, each sized the width of the image. They store just the x values of the first and last pixel of consecutive source color pixels in a row, for the current row and the next. The third is the usual stack, that stores the x and y value of the first source color pixel in a direction opposed to the cascade direction. Thanks to the two image width arrays, this stack is used only in a low number of cases.
Some numbers: FillFall speed on my three test image 1A 226-279 2A 523-702 3A 397-414 I added a fourth test :a random noise image (1 in 5 pixels black, this is hard work for an algorithm):
- Fill Fall 1699-1728
- Quick Fill 1996-2064
- Border Trace 2072-2100
- Criss Cross 2504-2575
- And Fill Fall stack use on that image : 615 + 2*511 ints = 3482 bytes.
void FillFallN(int x, int y, color fillcolor)
{
int x1, x2, x3, x4, y1, y2, y3, y4, z, p;
color oldcolor = tB[y][x];
stackPointer = 0;
int tArr0[WI], t1Arr0[WI];
int* tArr = tArr0;
int* t1Arr = t1Arr0;
int* between;
int tPt = 0, pPt = 0, t1Pt = 0;
if (Same(fillcolor, oldcolor)) return;
push3(x, y, 1);
while (pop3(x, y, z))
{
x2 = x;
while (x2 < w && Same(oldcolor, tB[y][x2]))
{
tB[y][x2] = fillcolor;
x2++;
}
x1 = x - 1;
while (x1 > -1 && Same(oldcolor, tB[y][x1]))
{
tB[y][x1] = fillcolor;
x1--;
}
x1++;
if (x1 > x2) continue;
tPt = 0;
tArr[tPt] = x1;
tPt++;
tArr[tPt] = x2;
tPt++;
if (z == 1) {
y4 = y - 1;
if (y4 > -1)
{
x = x1;
while (x < x2)
{
while (x < x2 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x2) push3(x, y4, -1);
while (x < x2 && Same(oldcolor, tB[y4][x])) x++;
}
}
for (y1 = y + 1; y1 < h; y1++) {
pPt = 0;
t1Pt = 0;
while (pPt < tPt)
{
x3 = tArr[pPt];
pPt++;
x4 = tArr[pPt];
pPt++;
x2 = x3;
x1 = x3 - 1;
while (x1 > -1 && Same(oldcolor, tB[y1][x1]))
{
tB[y1][x1] = fillcolor;
x1--;
}
x1++;
y4 = y1 - 1;
x = x1;
while (x < x3)
{
while (x < x3 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x3) push3(x, y4, -1);
while (x < x3 && Same(oldcolor, tB[y4][x])) x++;
}
while (x1 < x4) {
while (x2 < w && Same(oldcolor, tB[y1][x2]))
{
tB[y1][x2] = fillcolor;
x2++;
}
t1Arr[t1Pt] = x1;
t1Pt++;
t1Arr[t1Pt] = x2;
t1Pt++;
while (x2 < x4 && !Same(oldcolor, tB[y1][x2])) x2++;
x1 = x2;
}
y4 = y1 - 1;
x = x4;
while (x < x2)
{
while (x < x2 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x2) push3(x, y4, -1);
while (x < x2 && Same(oldcolor, tB[y4][x])) x++;
}
}
if (t1Pt == 0) break;
between = tArr;
tArr = t1Arr;
t1Arr = between;
tPt = t1Pt;
}
}
else
{
y4 = y + 1;
if (y4 < h)
{
x = x1;
while (x < x2)
{
while (x < x2 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x2) push3(x, y4, 1);
while (x < x2&& Same(oldcolor, tB[y4][x])) x++;
}
}
for (y1 = y - 1; y1 > -1; y1--)
{
pPt = 0;
t1Pt = 0;
while (pPt < tPt)
{
x3 = tArr[pPt];
pPt++;
x4 = tArr[pPt];
pPt++;
x2 = x3;
x1 = x3 - 1;
while (x1 > -1 && Same(oldcolor, tB[y1][x1]))
{
tB[y1][x1] = fillcolor;
x1--;
}
x1++;
y4 = y1 + 1;
x = x1;
while (x < x3)
{
while (x < x3 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x3) push3(x, y4, 1);
while (x < x3 && Same(oldcolor, tB[y4][x])) x++;
}
while (x1 < x4)
{
while (x2 < w && Same(oldcolor, tB[y1][x2]))
{
tB[y1][x2] = fillcolor;
x2++;
}
t1Arr[t1Pt] = x1;
t1Pt++;
t1Arr[t1Pt] = x2;
t1Pt++;
while (x2 < x4 && !Same(oldcolor, tB[y1][x2])) x2++;
x1 = x2;
}
y4 = y1 + 1;
x = x4;
while (x < x2)
{
while (x < x2 && !Same(oldcolor, tB[y4][x])) x++;
if (x < x2) push3(x, y4, 1);
while (x < x2 && Same(oldcolor, tB[y4][x])) x++;
}
}
if (t1Pt == 0) break;
between = tArr;
tArr = t1Arr;
t1Arr = between;
tPt = t1Pt;
}
}
}
return;
}
Additional
To get the Visual Studio solution to work properly, you have to change a project property. Start the .sln file. In the opened solution, select Project->Properties->Advanced->C++/CLI Properties->Common Language Runtime Support->Common Language Runtime Support (/clr). This assumes you have C++/CLI modules installed. It enables managed code, this program uses it for bitmap access. Another option is to use your own bitmap data access by rewriting function DoTest
.
History
- 7th September, 2021: Initial version