(untagged)

The Atalasoft OCR engine

Steve Hawley

3 Oct 2005

Convert images to plain text with the Atalasoft suite of OCR objects.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

At Atalasoft, we’re excited to unveil the newest addition to our product line, Atalasoft OCR. This suite of objects, now available, provides interfacing to OCR engines in a way that makes integration into your .NET application a snap.

In the classes provided, we offer the best of all possible worlds: a multilayered approach to exposing engine capabilities that gets up and running quickly, yet also allows you to get down to the nitty gritty details that are most important to you.

When using Atalasoft OCR engine in its most basic way, most of the work is in managing the user interface and not the OCR engine.

The following snippet of C# code demonstrates how to convert a set of image files into a single plain text file.

static void Main(string[] args)
{
    // create and initialize the engine
    ExperVisionEngine engine = new ExperVisionEngine(null, null);
    engine.Initialize();
     
    // select a file or set of files
    OpenImageFileDialog openDialog = new OpenImageFileDialog();
    openDialog.Multiselect = true;
    if (openDialog.ShowDialog() == DialogResult.OK) 
    {
        SaveFileDialog saveFileDialog = new SaveFileDialog();
        saveFileDialog.Filter = "Text (*.txt)|*.txt";
        if (saveFileDialog.ShowDialog() != DialogResult.OK)
            return;
        try 
        {
            // translate into a plain text file
            engine.Translate(
                new FileSystemImageSource(openDialog.FileNames, true),
                "text/plain", saveFileDialog.FileName);
        }
        catch (OcrException err) 
        {
            System.Console.WriteLine("Error in OCR: " + err.Message);
        }
    }
    engine.ShutDown();
}

As you can see, the interfacing is simple. You may also notice that the main use of the engine is the Translate method, which will takes a set of images and writes them to a file (or stream) using the given MIME type as the output format. By using the MIME standard to describe output file types, it is easy to ask the engine what output types it can support as well as to augment or replace them!

The OcrEngine maintains a collection of objects that implement an interface called ITranslator. When you request that a set of images are to be translated to an output file format, the engine will select a translator that matches the mime type.

If your task requires you to generate output in a particular format, it is short work to create your own object to translate the recognized text and images into the format that you need. You can add your new translator or take away from the engine’s translator collection as you see fit. You can even bypass the translator selection process entirely and simply supply the translator that you want to use.

OCR Engine Events

Through the familiar .NET event mechanism, you can get hooked into every step of document processing, allowing you to finely control how your images are handled. For example, you can request notification during the stage when an image is preprocessed to make it more palatable for the OCR engine, letting you alter what the engine will use for recognition.

In the following C# code snippet, you can see how to hook in your own code to do image preprocessing:

static void Main(string[] args)
{
    // create and initialize the engine
    ExperVisionEngine engine = new ExperVisionEngine(null, null);
    engine.Initialize();
 
    engine.PagePreprocessing +=
        new OcrPagePreprocessingEventHandler(engine_PagePreprocessing);
}
 
private static void engine_PagePreprocessing(
    object sender, OcrPagePreprocessingEventArgs e)
{
    // override all options
    e.OptionsOut = 0;
 
    AtalaImage imageBW;
    // convert to black and white, if needed
    if (e.ImageIn.PixelFormat != PixelFormat.Pixel1bppIndexed)
        imageBW = e.ImageIn.GetChangedPixelFormat(
                                 PixelFormat.Pixel1bppIndexed);
    else
        imageBW = e.ImageIn;
 
    // Deskew the image
    AutoDeskewCommand deskew = new AutoDeskewCommand();
    AtalaImage imageDeskewed = deskew.ApplyToImage(imageBW);
    if (imageBW != imageDeskewed && imageBW != e.ImageIn)
        imageBW.Dispose();
 
    // Hand back to the engine
    e.ImageOut = imageDeskewed;
}

As you can see, the amount of work to get hooked in is small, letting you concentrate on the task: processing the image in the way that you want.

The Atalasoft OCR objects let you hook into image processing, image segmentation, and output page construction. There are also events to let you track progress of the engine on a page as well as throughout an entire document. This lets you show your users what they need to know.

Contact Atalasoft directly for more details, or download a 30 day trial of our OCR engine today.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here