Introduction
Since my last article named “Neural Network for Recognition
of Handwritten Digits in C#” has been posted on Codeproject.com, I received several emails contributing to it. It was really
appreciated. In order to encourage people who want to study on handwriting recognition techniques I have written a small control named UPV (UNIPEN handwriting database viewer) to
view online handwritten samples of UNIPEN, one of biggest handwriting database all over the world. From this control, hand written pen streams can be exported to bitmap
to use in other programs. The control uses a dll file named UPUnipenLib.dll, a small part of a library I am developing for a sponsored handwriting
character recognition project. It is not open source project at moment but this file is free to use completely in non-commerce.
Background
Requirements
UNIPEN database download from: http://unipen.nici.kun.nl/
(The downloaded folder should be changed to “UnipenData” before using)
UNIPEN library: UPUnipenLib.dll (attached in source code and demo)
UNIPEN vs MNIST database
In my last article, I used MNIST for my neural network. It is a database of handwritten digits has a training set of 60,000 examples, and a test set of
10,000 examples. It is really good database for learning techniques and pattern recognition methods on real-world data. However, due to its limit, MNIST
database is not enough to a more complex recognition system.
By virtue of a handwriting recognition system, having a large enough database is a challenge. As of 1999, the international Unipen Foundation was
installed to safeguard the distribution of a large database of on-line handwritten samples, collected by a consortium of 40 companies and institutes.
It is a huge free database having over 5 million characters, from more than 2200 writers. The database has been
being used by researchers at many labs and universities in all over the world. It is also was used as testing samples for many pattern recognition competitions like ICDAR.
In general, UNIPEN data set is difficult to recognize despite the data format is standardized. The UNIPEN format can be referenced here.
Although UNIPEN project have some tools to view and edit the database on their website but they were written on the very old computer systems in 15
years ago. So, they are almost useless to me to understand how to view the data. Therefore, the method I used in my project was completely built based on
my own knowledge on the data format.
Using the code
How to use the UPUnipenLib.dll file and the UPV control
UPUnipenLib.dll is a library of functions for reading UNIPEN data files as well as
exporting UNIPEN streams to bitmap object. Classes have been constructed based on UNIPEN file definition. The below diagram will give you a general view which
should help easier to understand and use the library file.
Using library to get pen-stream and export it to bitmap is very simple as follows:
private void btnOpen_Click(object sender, EventArgs e)
{
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.InitialDirectory = Environment.CurrentDirectory;
openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
upDataset = new UPUnipenLib.UPDataSet();
if (openFileDialog.ShowDialog(this) == DialogResult.OK)
{
try
{
upDataset.InitialDirectory = "";
upDataset.FillDataSetFromFile(openFileDialog.FileName);
if (upDataset.Datalayouts.Count > 0)
{
FillUpTreeview();
iDatalayout = 0;
foreach (UPDataLayout dl in upDataset.Datalayouts)
{
if (dl.UpUnipens.Count > 0)
{
unipen = dl.UpUnipens[0];
UpdateBitmap();
break;
}
iDatalayout++;
}
}
else
{
MessageBox.Show("The opened file is invalid, please select other file");
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
UpdatePanels(true);
}
Export to bitmap:
private void UpdateBitmap()
{
UnipenBitmap ibox = new UnipenBitmap(unipen, (int)udPenwidth.Value);
if (rbLine.Checked)
{
ibox.Drawtype = DrawType.Line;
}
else if (rbPie.Checked)
{
ibox.Drawtype = DrawType.Pie;
}
ibox.Multicolor = cbMultipenColor.Checked;
ibox.PenWidth =(int) udPenwidth.Value;
ibox.Uppen = cbUppen.Checked;
this.Image = ibox.Image;
}
Points of Interest
The UPV control is constructed based on the project “Image Viewer UserControl” of Jordy
"Kaiwa" Ruiter, but it was adjusted to fit to the purpose of pen stream presentation. On this control, you can view not only the pen stream but
also know how the pen stream is made through its colors. You can also view pen-down trajectories as well as pen-up ones.
Picture 2: Unipen sample with up pen trajectories
Picture 2: Unipen sample without up pen trajectories
The UPV control can show UNIPEN samples accurately in most cases. However, because the database is too large, so I have not time to
fix all problem should have. Your feedback would be highly appreciated.
History
ver 1.01:
Fix bugs in special SEGMENT case (example):
.SEGMENT CHARACTER 1:40-3,5,6:0-6:12 OK "1"
Fix mutil included files. Tested ok in almost unipen data files (1a,1b,1c,1d,2)
TuDienTiengViet.Net | |
|
|
TuDienTiengViet.Net | |
|
|