Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

UPV – UNIPEN online handwriting recognition database viewer control

0.00/5 (No votes)
19 Mar 2012 1  
UPV – UNIPEN online handwriting recognition database viewer control

Introduction

Since my last article named “Neural Network for Recognition of Handwritten Digits in C#” has been posted on Codeproject.com, I received several emails contributing to it. It was really appreciated. In order to encourage people who want to study on handwriting recognition techniques I have written a small control named UPV (UNIPEN handwriting database viewer) to view online handwritten samples of UNIPEN, one of biggest handwriting database all over the world. From this control, hand written pen streams can be exported to bitmap to use in other programs. The control uses a dll file named UPUnipenLib.dll, a small part of a library I am developing for a sponsored handwriting character recognition project. It is not open source project at moment but this file is free to use completely in non-commerce.

Background 

Requirements

UNIPEN database download from: http://unipen.nici.kun.nl/

(The downloaded folder should be changed to “UnipenData” before using)

UNIPEN library: UPUnipenLib.dll (attached in source code and demo)

UNIPEN vs MNIST database

In my last article, I used MNIST for my neural network. It is a database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is really good database for learning techniques and pattern recognition methods on real-world data. However, due to its limit, MNIST database is not enough to a more complex recognition system.

By virtue of a handwriting recognition system, having a large enough database is a challenge. As of 1999, the international Unipen Foundation was installed to safeguard the distribution of a large database of on-line handwritten samples, collected by a consortium of 40 companies and institutes. It is a huge free database having over 5 million characters, from more than 2200 writers. The database has been being used by researchers at many labs and universities in all over the world. It is also was used as testing samples for many pattern recognition competitions like ICDAR.

In general, UNIPEN data set is difficult to recognize despite the data format is standardized.  The UNIPEN format can be referenced here

Although UNIPEN project have some tools to view and edit the database on their website but they were written on the very old computer systems in 15 years ago. So, they are almost useless to me to understand how to view the data. Therefore, the method I used in my project was completely built based on my own knowledge on the data format. 

Using the code

How to use the UPUnipenLib.dll file and the UPV control

UPUnipenLib.dll is a library of functions for reading UNIPEN data files as well as exporting UNIPEN streams to bitmap object. Classes have been constructed based on UNIPEN file definition. The below diagram will give you a general view which should help easier to understand and use the library file.

Using library to get pen-stream and export it to bitmap is very simple as follows: 

private void btnOpen_Click(object sender, EventArgs e)
{
    OpenFileDialog openFileDialog = new OpenFileDialog();
    openFileDialog.InitialDirectory = Environment.CurrentDirectory;
    openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
    openFileDialog.FilterIndex = 2;
    openFileDialog.RestoreDirectory = true;
    upDataset = new UPUnipenLib.UPDataSet();
    if (openFileDialog.ShowDialog(this) == DialogResult.OK)
    {
        try
        {
            //set upDataset.InitialDirectory=full path of unipen folder if it is not "UnipenData".
            upDataset.InitialDirectory = "";
            upDataset.FillDataSetFromFile(openFileDialog.FileName);
            if (upDataset.Datalayouts.Count > 0)
            {
                //fill up the treeview
                FillUpTreeview();
                //
                iDatalayout = 0;
                //
                foreach (UPDataLayout dl in upDataset.Datalayouts)
                {
                    if (dl.UpUnipens.Count > 0)
                    {
                        unipen = dl.UpUnipens[0];
                        UpdateBitmap();
                        break;
                    }
                    iDatalayout++;
                }
            }
            else
            {
                MessageBox.Show("The opened file is invalid, please select other file");
            }
        }
        catch (Exception ex)
        {
            MessageBox.Show(ex.ToString());
        }
    }
    UpdatePanels(true);
}

Export to bitmap:

private void UpdateBitmap()
{
    UnipenBitmap ibox = new UnipenBitmap(unipen, (int)udPenwidth.Value);
    if (rbLine.Checked)
    {
        ibox.Drawtype = DrawType.Line;
    }
    else if (rbPie.Checked)
    {
        ibox.Drawtype = DrawType.Pie;
    }
    ibox.Multicolor = cbMultipenColor.Checked;
    ibox.PenWidth =(int) udPenwidth.Value;
    ibox.Uppen = cbUppen.Checked;
    // convert to and show the bitmap 
    this.Image = ibox.Image;
}

Points of Interest

The UPV control is constructed based on the project “Image Viewer UserControl”  of Jordy "Kaiwa" Ruiter, but it was adjusted to fit to the purpose of pen stream presentation. On this control, you can view not only the pen stream but also know how the pen stream is made through its colors. You can also view pen-down trajectories as well as pen-up ones.

Picture 2: Unipen sample with up pen trajectories

Picture 2: Unipen sample without up pen trajectories

The UPV control can show UNIPEN samples accurately in most cases. However, because the database is too large, so I have not time to fix all problem should have. Your feedback would be highly appreciated. 

History 

ver 1.01:

Fix bugs in special SEGMENT case (example):

  .SEGMENT CHARACTER 1:40-3,5,6:0-6:12 OK "1" 

Fix mutil included files. Tested ok in almost unipen data files (1a,1b,1c,1d,2) 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here