You can resort to connected components labeling.
This means that you detect sets of black pixels that touch each other by a side (optionally by a corner). The size of the set (number of pixels) will tell characters from punctuation.
You should find suitable connected components labeling algorithms on the web (
http://en.wikipedia.org/wiki/Connected-component_labeling[
^]). You can also search for a ready made image processing library in C# that features it.
I can also recommend a less classical approach based on flood filling: scan the image top-down and left-right until you hit a black pixel; from this pixel, use a flood filling algorithm to paint it white (
http://en.wikipedia.org/wiki/Flood_fill[
^]). While you are filling, count the pixels. Continue the scan from the pixel hit. In the end, you'll get a wholly white image but you'll have seen every connected component once.