(untagged)

OCR Line Detection

mehran ghainian hasaruye

0.00/5 (No votes)

10 Jul 2010

A simple algorithm for extracting lines in an image.

Download source code - 147 KB

Introduction

One of the first steps in developing OCR systems is line detection. Farsi/Arabic text has some properties which make them difficult to recognize. For example, there are characters in Farsi like "i" in English which has two parts but are recognized as one character. And I have covered this problem in the following code.

Background

The reader is assumed to have basic GDI skills and knowledge of elementary concepts of image processing.

Using the code

First of all, you should take it into account that this algorithm does not detect lines of characters covered vertically by a line like in the image below:

The algorithm is so easy:

Threshold image
Consider horizontal projection of line of character as a continuous vertical line
Scan image from top to bottom and find the top and bottom of each vertical line from the previous phase
Because characters like ? are identified as two lines, we merge those lines whose distance to the next line is a fraction of their height
Save lines in the output directory

First, we should threshold the image. I used a trivial thresholding algorithm, but algorithms like the famous Otsu thresholding will result in a better image.

public Bitmap Threshold(Bitmap bitmap, int thresholdValue)
{
     byte thrByte = (byte)(thresholdValue);
     bitmap = ApplyFilter(new Threshold(thrByte), bitmap);
     bitmap = GetIndexedPixelFormat(bitmap);
     return bitmap;
}

In the second step, we try to project all black cells horizontally to extract the horizontal projection of the image. This will result in a discontinuous collection of black points which we consider the top and bottom of each collection, as the top and bottom of the line:

public List<Belt> ExtractBeltsBasedonCoveredHeight(Bitmap mehrImage)
{
    int y = 0;
    int x = 0;
    bool line_present = true;
    List<int> line_top = new List<int>(1000);
    List<int> line_bottom = new List<int>(1000);
    List<Belt> lines = new List<Belt>();
    while (line_present)
    {
        x = 0;
        y = FindNextLine(mehrImage, y, ref x);
        if (y == -1)
        break;
        if (y >= mehrImage.Height)
        {
            line_present = false;
        }
        if (line_present)
        {
            line_top.Add(y);
            y = FindBottomOfLine(mehrImage, y) + 1;
            line_bottom.Add(y);
        }
    }
   
    for (int line_number = 0; line_number < line_top.Count; line_number++)
    {
        int height = line_bottom[line_number] - line_top[line_number] + 1;
        Bitmap bmp = new Bitmap(mehrImage.Width, height + 2);
        FillImage(bmp, Brushes.White);
        bmp = GetSpecificAreaOfImage(
        new Rectangle(0, line_top[line_number] - 1, 
                      mehrImage.Width, height + 2), mehrImage);
        Belt belt = new Belt(bmp);
        belt.RelativeTop = line_top[line_number];
        belt.RelativeBottom = line_bottom[line_number];
        lines.Add(belt);
    }
    lines = RemoveNoisyData(lines);
    return lines;
}

To find the bottom and top of lines, I developed these two functions: FindNextLine, which finds the first black pixel of the next collection extracted from the horizontal projection, and FindBottomOfLine, which looks for the first white pixel with a Y dimension bigger than the top of the line.

public int FindBottomOfLine(Bitmap bitmap, int topOfLine)
{
     int x;
     bool no_black_pixel;
     no_black_pixel = false;
     while (no_black_pixel == false)
     {
         topOfLine++;
         no_black_pixel = true; 
         for (x = 0; x < bitmap.Width && topOfLine < bitmap.Height; x++)
         {
              if ((Convert.ToString(bitmap.GetPixel(x, 
                           topOfLine)) == Shape.BlackPixel))
              no_black_pixel = false;
         }
     }
     return topOfLine - 1;
}

public int FindNextLine(Bitmap bitmap, int y, ref int x)
{
      if (y >= bitmap.Height)
      return -1;
      while (Convert.ToString(bitmap.GetPixel(x, y)) == Shape.WhitePixel)
      {
          x++;
          if (x == bitmap.Width)
          {
              x = 0;
              y++;
          }
          if (y >= bitmap.Height)
          {
              break;
          }
      }
      return y < bitmap.Height ? y : -1;
}

Because characters like '?' are identified as two lines, we merge those lines whose distance to the next line is a constant fraction of their height:

private static List<Belt> RemoveNoisyData(List<Belt> belts)
{
   if (!Directory.Exists("temp"))
   {
        Directory.CreateDirectory("temp");
   }
   else
   {
        foreach (string file in Directory.GetFiles("temp"))
        {
              try
              {
                   //File.Delete(file);
              }
              catch
              { }
        }
  }
  for (int i = 1; i < belts.Count; i++)
  {
        if (belts[i].RelativeTop - belts[i - 1].BaseHorizontalLine - 
            belts[i - 1].RelativeTop < 
            Belt.UpAndDownWhiteSpaceRatio * belts[i].Height)
        {
              Image<Gray, Byte> img1 = new Image<Gray, byte>(belts[i].Image);
              Image<Gray, Byte> img2 = new Image<Gray, byte>(belts[i - 1].Image);
              Image<Gray, Byte> img3 = img2.ConcateVertical(img1);
              string path = @".\temp\" + System.Guid.NewGuid().ToString();
              img3.Save(path);
              belts[i - 1].Image = (Bitmap)Bitmap.FromFile(path);
              belts[i - 1].RelativeBottom = belts[i].RelativeBottom;
              belts[i - 1].BaseHorizontalLine = -1;
              belts.RemoveAt(i);
        }
  }
  return belts;
}

And ultimately, we save the images of the lines in the output directory.

Experimental results

I tested this algorithm for different fonts and sizes, including Mitra, TimesNewRoman, Arial, and Zar. For those without any noise, it works 96% percent, but for noisy samples, based on the noise ratio, we get different results which are not acceptable.

History

I have spent two years of my life developing an Open Source Farsi /Arabic OCR, and now I want to share some of my experiences here. If you are interested in developing Farsi/Arabic OCR, you can join the following group: farsi_arabic_OCR@groups.yahoo.com.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here