<object type="application/x-shockwave-flash" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=3,0,0,0" width="480" height="360" data="http://www.youtube.com/v/5OxLUI8CSoE?version=3&hl=vi_VN"><param name="movie" value="http://www.youtube.com/v/5OxLUI8CSoE?version=3&hl=vi_VN" /><param name="quality" value="high" /><param name="wmode" value="transparent" />
Introduction
Now a day, artificial neural network has been applied
popularly in many fields of human life. However, creating an efficient network
for a large classifier like handwriting recognition systems is still
a big challenge to scientists. In my last article named “Library for online handwriting recognition system using UNIPEN database”, I presented an
efficient library for a handwriting recognition system which can create, change
a neural network simply. The demo program showed good recognition results to
digit set (97%) and alphabet sets (93%).This article I will continue to present
a solution for a large patterns classification in general and handwriting
recognition in particular.
<o:p>
<o:p>
Recognition rate significantly increate when using additional spell checker module
Neural network for a recognition system
In the traditional model of pattern recognition, a
hand-designed feature extractor gathers relevant information from input and
eliminates irrelevant variabilities. A trainer classifier (normally, a
standard, fully-connected multi-layer neural network can be used as a
classifier) then categorizes the resulting feature vectors into classes.
However, it could have some problems which should influent to the recognition
results. The convolution neural network (CNN) solves this shortcoming of
traditional one to achieve the best performance on pattern recognition task. <o:p>
The CNNs is a special form of multi-layer neural network.
Like other networks, CNNs are trained by back propagation algorithms. The
difference is inside their architecture. The convolutional network combines
three architectural ideas to ensure some degree of shift, scale, and distortion
invariance: local receptive field, shared weights (or weight replication)
spatial or temporal sub-sampling. They have been designed especially to
recognize patterns directly from digital images with the minimum of
pre-processing operations. The architecture details of CNN have been described
comprehensively in articles of Dr. Yahn LeCun and Dr. Patrice Simard (see my
previous articles).
Figure 1:
The Architecture of LeNET 5
<o:p>
<o:p>
Figure 2:
An input image followed by a feature map performing a 5 × 5 convolution and a 2
x 2 sub-sampling map
The recognition results of the above networks are really
high to small patterns collection such as digit, capital letters or lower case
letters etc. However, when we want to create a larger neural network which can
recognize a bigger collection like digit and English letters (62 characters)
for example, the problems begin appear. Finding an optimized and large enough
network becomes more difficult, training network by large input patterns takes
much longer time. Convergent speech of the network is slower and especially,
the accuracy rate is significant decrease because bigger bad written
characters, many similar and confusable characters etc. Furthermore, assuming
we can create a good enough network which can recognize accurately English
characters but it certainly cannot recognize properly a special character outsize its
outputs set (a Russian or Chinese character) because it does not have expansion
capacity. Therefore, creating a unique network for very large patterns
classifier is very difficult and may be impossible.
The
proposed solution to the above problems is instead of using a unique big
network we can use multi smaller networks which have very high recognition rate
to these own output sets. Beside the official output sets (digit, letters…)
these networks have an additional unknown output (unknown character). It means
that if the input pattern is not recognized as a character of official outputs
it will be understand as an unknown character. Then the input pattern will be
transferred to the next network until the system can recognize it correctly.
<o:p>
Figure 3: Convolution neural network
with unknown output
<o:p>
Figure 4: Recognition System using
multi neural networks
This solution overcomes almost limits of the traditional model. The new system
includes a several small networks which are simple for optimizing to get the
best recognition results. Training these small networks takes less time than a
huge network. Especially, the new model is really flexible and expandable.
Depending on the requirement we can load one or more networks; we can also add
new networks to the system to recognize new patterns without change or rebuilt
the model. All these small networks have reusable capacity to an other multi neural networks system.
Experiment
The demo program is built to the purpose showing all stages
of a recognition system including: create a component network, train a network,
test networks on UNIPEN dataset and test networks on a mouse drawing control.
It is tutorials which can help everybody can understand to a recognition
system. All functions can be implemented on the program GUI. So you can create,
train, and test your network on runtime without change any code or restart the
program.<o:p>
<o:p>
Figure 5: Handwriting recognition system interface
Creating new neural
network
Figure 6: Creating new neural network Interface
Creating new neural network completely bases on GUI. Creating
a network depends on the input pattern size, number of layers, data set…. On
the output layer we can choose unknown output checkbox to create an additional
unknown output to the network or ignore it to create a normal network.<o:p>
<o:p>
Of course, we can still to create a network by code:
void CreateNetwork()
<pre> {
network = new ConvolutionNetwork();
network.Layers = new Layer[6];
network.LayerCount = 6;
InputLayer inputlayer = new InputLayer("00-Layer Input", new Size(29, 29));
network.InputDesignedPatternSize = new Size(29, 29);
inputlayer.Initialize();
network.Layers[0] = inputlayer;
ConvolutionLayer convlayer = new ConvolutionLayer("01-Layer ConvolutionalSubsampling", inputlayer, new Size(13, 13), 10, 5);
convlayer.Initialize();
network.Layers[1] = convlayer;
convlayer = new ConvolutionLayer("02-Layer ConvolutionalSubsampling", convlayer, new Size(5, 5), 60, 5);
convlayer.Initialize();
network.Layers[2] = convlayer;
FullConnectedLayer fulllayer = new FullConnectedLayer("03-Layer FullConnected", convlayer, 200);
fulllayer.Initialize();
network.Layers[3] = fulllayer;
fulllayer = new FullConnectedLayer("04-Layer FullConnected", fulllayer, 100);
fulllayer.Initialize();
network.Layers[4] = fulllayer;
OutputLayer outputlayer = new OutputLayer("05-Layer Output", fulllayer, Letters3.Count, true);
outputlayer.Initialize();
network.Layers[5] = outputlayer;
network.TagetOutputs = Letters3;
network.UnknownOuput = '?';
}
Training a network
After creating a neural network using "Create network" function, the network will be trained using UNIPEN database.
Figure 7: Training network interface
<o:p>
<o:p>
Depending on the network size we can choose training set is
1a, 1b or 1c in the UNIPENdata folder. Statistic of training process can show
many useful information such as: No. of epoch, MSE, training time per epoch,
success rate…<o:p>
UNIPEN data browser
and recognition testing<o:p>
The UNIPEN data browser control in the demo
program can show all the UNIPEN data files. We can also test the trained neural
network on these files by loading trained network parameters files.
Figure 8: UNIPEN data browser and recognition interface
<o:p>
Mouse Drawing test
Figure 9: Mouse drawing recognition interface
<o:p>
The mouse drawing control is based on the excellent article ”DrawTools”
by Alex Fr. I just
changed some codes to fit to my requirement. The cursive text in the image is divided
to line, word and isolated character by same algorithm as follows:
private void btRecognition_Click(object sender, EventArgs e)
<pre> {
if (bitmap != null)
{
bitmap.Dispose();
bitmap = null;
}
bitmap = new Bitmap(drawArea.Width, drawArea.Height);
drawArea.DrawToBitmap(bitmap, new Rectangle(0, 0, bitmap.Width, bitmap.Height));
drawBitmap =(Bitmap) bitmap.Clone();
if (bitmap != null)
{
lbRecognizedText.Items.Clear();
List<InputPattern> lineList=null;
List<InputPattern> wordList=null;
InputPattern parentPt=new InputPattern(bitmap,255,new Rectangle(0,0,bitmap.Width,bitmap.Height));
lineList = GetPatternsFromBitmap(parentPt,500,1,true,10,10);
if (lineList.Count > 0)
{
if (characterList != null)
{
characterList.Clear();
characterList = null;
}
characterList = new List<InputPattern>();
foreach (var line in lineList)
{
String text = "";
wordList = GetPatternsFromBitmap(line, 50, 10,false, 10, 10);
if (wordList != null)
{
if (wordList.Count > 0)
{
foreach (var word in wordList)
{
List<InputPattern> charList = GetPatternsFromBitmap(word, 5, 5, false, 10, 10);
if (charList != null)
{
if (charList.Count > 0)
{
panelNavigation.Visible = true;
foreach (var c in charList)
{
characterList.Add(c);
c.GetPatternBoundaries(5,5,false,10,10);
Char accChar = new Char();
PatternRecognition(c.OriginalBmp,out accChar);
if (accChar != '\0')
{
text = String.Format("{0}{1}", text, accChar.ToString());
drawBitmap = c.DrawChildPatternBoundaries(drawBitmap);
}
}
}
}
text = String.Format("{0} ", text);
}
}
}
lbRecognizedText.Items.Add(text);
}
}
pbPreview.Image = drawBitmap;
lblNavigation.Text = characterList.Count.ToString();
index = 0;
}
}
Figure 10: Loading trained network parameters files
In
order to active the recognition function I simply load trained network
parameters files. Depending to my recognition requirement I can load one, two
or all files. The recognition results are really good (higher 90%) if I load only
one network to recognize its output characters. However, when I load multi
network the system’s accuracy rate becomes lower.
The main reasons are many confusable characters in cursive text; the training
sets are not large enough etc.
For a large pattern collection like handwritten characters, there are so many
similar characters which can make not only machine but also human confuse
in some cases such as: O, 0 and o; 9, 4,g,q etc. These characters can make
networks misrecognize. Hence the solution has been being upgraded which significant
increate recognition rate by using an additional spellchecker/voting module at
the output of system. The input pattern will be recognized by all component
networks. These outputs (except unknown outputs) then will be set as the inputs
of the spellchecker/voting module. The module will bases on previous recognized
characters, internal dictionary and other factors to decide which one will be
the most accurated recognized character.
Figure 11: The new recognition system using Spell checker /voting module
The new recognition system using Spell checker /voting module (internal dictionary)
The spellchecker module makes the system recognizes much better
Conclusion
The proposed recognition model has solved amost prolems to a large recognition system: the capacity of recognizing large partern collection, flexible design and deployment, expanable and resuable capacity...etc. Increasing accuracy rate to the system also can do easier by increasing recognition rate of component networks, using the spell checker /voting module etc. The demo program also proved the capacity of the library which should be used in many other applications such as prediction application, face recognition...
Fututre work and upgrade
Some features would be udate to the library:
- Convolution and sampling layer of LeNET model.
- Spell checker / voting module
-character segmentation.
At the moment, the project took to much my free time. It should be slowdown or temporary stop until I can re-arrange everything and/or find a new good sponsorship. Howerver the vote/comment to the article would decice the project will continue or not. I will really appreciate to receive comments and suggessions to the article especially to the model, spell checker module and character segmentation algorithm...
History
version 1.0: initial code
version 1.1 the spell checker /voting module has been added to the system which increates significantly recognition rate. It made me really supprised and happied. I will publish it when I complete code rearrangement.