Hi,
There are many ways to achieve this.
1) The best way would be using a clustering algorithm with a maximum likelihood function. On the result of it each pixel will have a cluster membership. So in your subsequent search's you only need to search within the cluster members, not all the pixels again and again. If you use a unsupervised clustering technique then it will evolve into automatic clusters to number of cluster you asked. But if you want to control the clusters according to the nature of the image (For example if the image has forest and river only you may willing to have two clusters) then go for supervised clustering algorithm.
2) Second way is use Frequency mapping. Create each dictionary for R,G and B where keys are the color values (0 - 255) and values will be a list of pixel index. Say if you have 100 pixel , so say index it from 1 to 100. Now map the pixels against its R, store the list of pixels in the R dictionary with key as the R value. Once you mapped all pixels like this. Then say you pixel has R: 40, G:43: B:56 then using these values as key you will get a list of pixels in each band. Find the common pixels in these 3 lists( For this you can query the list using LinQ). If not present then find the lists for R+1,R-1,G+1,G-1,B+1, B-1.....the iteration goes in that way. In any case you wont be searching all pixels, in fact in a few iterations. Let me know if it confuse you I will write detailed note.
3) The third way would be an optimization of the second one. Human eye can found the nearest pixel to a certain extent. How we do that, obliviously we say both are same color. So why don't transform the waves in to color? I mean RGB in IHS where I is the intensity (bright or dull), S is the saturation (pure or impure) and H is the Hue otherwise the Color. So if you map the pixels to its Hue then from the pixels having same hue you can calculating the closest matching of the intensity and saturation.
Still lot of other ways also there. Choice is yours.
One more thing to optimize my previous suggestion. If you travel 1 color value at a time in all 3 lists at once, then you can keep track of how far you've travelled in the other lists to know what the best possible minimum color you could achieve is. That way, you can use that as an exit condition to stop early. So, if you've reds are at 5 away, greens are at 2 away, and blues are at 4 away, and you already have a color that is a total of 10 color values different from the original color, you can stop.
Just to add to my previous answer, I thought of another optimization. In each of the 3 lists, you could store sublists. Say you have 10 pixels that have a red value of 0. Then redList[0] would contain a list of those pixels sorted accorded to their green values. So, redList[0][1] would store a list of pixels sorted according to their blue values. So, redList[0][1][0] would contain a pixel with, say, a red value of 0, a green value of 20, and a blue value of 30. Then, redList[0][2][0] would have to contain a pixel with a greater green value than the pixel stored in redList[0][1][0]. That would allow you to eliminate possibilities faster. That's beginning to look like a tree structure... you might want to think about that some more and maybe a tree would be another good data structer to help you solve your problem.
Excellent question. I believe I have an answer for you.
Sort the pixels into 3 lists. Each list would be sorted by the color intensity of a color component (red, green, or blue). Now, make a grid and set each point to store information about where in the 3 arrays that pixel is located. The lists also store where in the grid their corresponding pixel is. That is your basic data structure.
As far as the algorithm, I'd go for one that basically finds the position of the pixel in the 3 lists, then moves outward in each direction, scanning for pixels that are similar in color. You make the first six pixels candidates for being the closest pixel by color. However, since you have 3 factors (red, green, and blue) that determine the difference in color, those 6 pixels may not be the closest in color (though they will be closest by perhaps one component of the color). You calculate exactly how big the color difference is for those 6 pixels, and call that the minimum so far. You then continue spreading those 6 pixels outward from their originating point in the list. If you find pixels that are closer by color to the original pixel, you make those the new minimums. If you travel so far away from the original pixel that it would be impossible for a new pixel to have a closer color, then you stop searching that color list. Once you have stopped searching all the lists, you have your pixel that is the closest by color to the original pixel.
Using a radix sort (basically, a bucket sort), the sorting of the 3 lists could be done in O(n) where "n" is the number of pixels. But, that would only have to be done up front once. Each time you need to find a "closest pixel" would take less than O(n) (though I'm not exactly sure what the big-O notation would be).
Well if you must do many searches, I guess you can consider iterating through the array, finding the best match for each pixel, and storing it.
This will sure take some time ( O(n2) ), but once it's done all queries will take no time at all. :)
This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)