Introduction
This project is about number recognition with multi layer perceptron and there are some new ways to extract features from pictures in this project.
This project get some datas
(that are numbers here) and learn with neural network mlp (multi layer perceptron) and after it learned, we test other numbers and our network says what the number is.
Background
- neural network
- artificial intelligence
- matlab
Using the Code
In this project, we have 10 different font of numbers for leaning and we extract 26 feature vector form pictures of numbers.
Feature vectors include:
- Density of colors in 4 sides of pictures
- The ratio of black to white in 4 sides of pictures
- Determine the horizontal and vertical lines
- Determine hole in photo
- Determine number of up, down, left and right ways in writing a number with chain code
- Convert the number to 9segment number
- Determine tangent of total angle of number of black pixels in each row and column
These are our feature vectors that we want to learn it to our neural network.
This project includes the Main
script and "pref
" "Extract Number
" "Extract Features
" and "chain code
" functions that do a special job that we will explain about. We also use the data from "datas
" and target from "target
" that we extracted from the "input data
" function.
Input datas()
This function gets number 1 in an argument to run. So this function gives the address of pictures and sends them to "Extract Features
" and saves the features. Finally, collect all features in one matrix and set it to output.
Datas
This data includes all feature vectors that we save from "input data", that we didn’t necessary extract again in each running and it's good for saving the time.
We have two different types of datas
in it:
num
: that includes feature vectors of 10 different fonts of numbers (0 to 9) and totally numbers are 100 Num_half
: that includes half of "num
" feature vectors
Target
Our target includes four different types in "binary
" type and "one by one
" type:
Target100
: 10 different fonts divided by "one by one
" target100
: 10 different fonts divided by binary
type with 4 bit Target50
: 5 different fonts divided by "one by one
" target50
: 5 different fonts divided by binary
type with 4 bit
Tip: Attention to capital and small alphabet in targets name.
If we want to learn 10 fonts to our neural network, we have to use "Target100
" or "target100
" for our target argument and if we want to learn 5 (half) fonts to our neural network, we have to use "Target50
" or "target50
" for our target argument.
Chain Code
This is the function that gets binary pictures and simulates the handwriting in two types "8 ways" or "4 ways". In other words, if we use 4 ways, chain code exports the numbers between 1 to 4, that means:
- 1 is move up
- 2 is move right
- 3 is move down
- 4 is move left
With this function, we simulate handwriting and we count the number of up, down, left and right ways that we went in drawing the number on paper .
Extract Features
This function gets the address in input argument and exports all features vectors of pictures that we talk about in output argument. In the following, we explain each code.
First of all, in line 2 to 9, we convert the pictures to grayscale and binary, then we reshape and compress it.
I = imread(address);
%%chenge to gray
Igray = rgb2gray(I);
%%chenge to bw
Ibw = im2bw(Igray,graythresh(Igray));
T=reshape(Ibw,10000,1);
Ibw_comp=resizem(Ibw,0.25);
Then, in line 13 to 56, we divide picture to 4 parts, then we find the Density of colors and ratio of black to white in each part. (8 the first feature was created)
k=0;
j=0;
for i=1:2500
if(T(i,1)== 1)
k=k+1;%white
else
j=j+1;%black
end
end
ch1=j/2500;
w_b1=j/k;
j=0;
k=0;
for i=2500:5000
if(T(i,1)== 1)
k=k+1;%white
else
j=j+1;%black
end
end
ch2=j/2500;
w_b2=j/k;
j=0;
k=0;
for i=5000:7500
if(T(i,1)== 1)
k=k+1;%white
else
j=j+1;%black
end
end
ch3=j/2500;
w_b3=j/k;
j=0;
k=0;
for i=7500:10000
if(T(i,1)== 1)
k=k+1;%white
else
j=j+1;%black
end
end
ch4=j/2500;
w_b4=j/k;
In line 59 to 92, we count the number of black pixels in each column and row, then we find the arc-tangent of the angle. After that, we add all angles and finally, we tangent the angle for finding the line slope. (2 other feature vectors were created).
k=0;
j=0;
for i=1:25
for j=1:25
if (Ibw_comp(i,j)==0)
k=k+1;
end
end
teta(i,1)=k;
k=0;
end
k=0;
for i=1:25
for j=1:25
if (Ibw_comp(j,i)==0)
k=k+1;
end
end
teta(i,2)=k;
k=0;
end
for i=1:25
teta(i,3)=(teta(i,2)/teta(i,1));
teta(i,4)=atan(teta(i,3))
end
for i=1:25
for j=1:4
if (isnan(teta(i,j))==1)
teta(i,j)=0;
end
end
end
teta(26,4)=sum(teta(:,4));
teta(27,4)=tan(teta(26,4))
From line 96 to 114, we count the black pixels in each column and if the number was more than our threshold, we understand there was a vertical line in this picture and we say it happened by setting number "1
" as feature vector.
Tip: '0
' means there is no vertical line and '1
' means there is a vertical line in the picture.
n=0;
j=1;
for i=1:100
for j=1:100
if(Ibw(j,i)==0)
n=n+1;
t(i)=n;
end
end
n=0;
end
n=max(t);
if(n>79)
ver=1;
else
ver=0;
end
From line 118 to 137 again, we count the black pixels in each row and if the number was more than our threshold , we understand there was a horizontal line in this picture and we say it happened by setting number "1
" as feature vector.
Tip: It is the same as the previous tip '0
' which means there is no horizontal line and '1
' means there is a horizontal line in picture.
For example, number 1 has a vertical line and number 4 has a horizontal line.
(2 other feature vectors were created.)
t=0;
n=0;
j=1;
for i=1:100
for j=1:100
if(Ibw(i,j)==0)
n=n+1;
t(i)=n;
end
end
n=0;
end
n=max(t);
if(n>40)
har=1;
else
har=0;
end
From line 140 to 167, we simulate the handwriting by chain code and we export the number of total moving in each way as feature vector. (Two other feature vectors were created).
B = bwboundaries(Ibw,4); % find the boundaries of all objects
CC = cell(1, length(B)); % pre-allocate
for k = 1:length(B)
CC{k} = chaincode(B{k},1); % chain code for the k'th object
end
up=0;
down=0;
left=0;
right=0;
for i=1:length(B)
t=max(size(CC{1,i}.code));
for j=1:t
if(CC{1,i}.code(j,1)==0)
right=right+1;
else if(CC{1,i}.code(j,1)==2)
up=up+1;
else if(CC{1,i}.code(j,1)==4)
left=left+1;
else if(CC{1,i}.code(j,1)==6)
down=down+1;
end
end
end
end
end
end
From line 171 to 292, we change the number to our 9 segment number.
We divide the picture to 9 parts, then in each part if the Density of black pixels are more than the threshold (that here is 150), the segment will be black and set number 1.
Do this in each 9 segments, then we change the number to look like 7 segments numbers and here 9 other feature vectors are created.
alfa=150;
k=0;
for i=1:33
for j=1:33
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg1=1;
else
seg1=0;
end
k=0;
%%------------------------
for i=1:33
for j=33:66
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg2=1;
else
seg2=0;
end
k=0;
for i=1:33
for j=66:99
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg3=1;
else
seg3=0;
end
k=0;
for i=33:66
for j=1:33
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg4=1;
else
seg4=0;
end
k=0;
for i=33:66
for j=33:66
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg5=1;
else
seg5=0;
end
k=0;
for i=33:66
for j=66:99
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg6=1;
else
seg6=0;
end
k=0;
for i=66:99
for j=1:33
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg7=1;
else
seg7=0;
end
k=0;
for i=66:99
for j=33:66
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg8=1;
else
seg8=0;
end
k=0;
for i=66:99
for j=66:99
if(Ibw(i,j)==0)
k=k+1;
end
end
end
if(k>alfa)
seg9=1;
else
seg9=0;
end
seg10=[seg1 seg2 seg3
seg4 seg5 seg6
seg7 seg8 seg9];
In line 295, we find that our picture has a hole or not and say it by setting number 1 and 0 as feature vector. (1 other feature vector was created).
Tip: 1 means there is a hole in the picture and 0 means there is no hole in the picture. (For example, number 8 has a hole but number 1 does not.)
hole=bweuler(Ibw);
Finally in line 298, we collect all feature vectors in one matrix and export it to the output argument.
chall=[ch1 ch2 ch3 ch4 w_b1 w_b2 w_b3 w_b4 ver har hole up down left right
seg1 seg2 seg3 seg4 seg5 seg6 seg7 seg8 seg9 teta(26,4) teta(27,4)];
Main:
This is the main part of our code and we run our project here.
In line 6 and 8, we get the inputs and targets.
% load data
load('datas.mat');
% load targets
load ('targets.mat');
input=num;
target=Target100;
As I mentioned, we have 4 types of target and we choose it by "style
" parameter.
Tip: If style=0
, our target is "one by one" type and if style=1
, our target will have to be "binary" type.
type=1;
In the following, we create our neural network with name "newff
" and train our datas
and targets with this network.
newff=feedforwardnet([10 10],'trainlm');
newff=train(newff,input,target);
In line 29, we test the number 1
. If our network says the number is 1
, it works well.
r= newff(num1); %<==Number for test
So we test it, but network exports the number between 0 and 1 and we should normalize it by threshold 0.5 till we find our answer.
Lines 31 till 39 show how we normalize the exported number if our type was "one by one
" type (the maximum similarity is answer).
if(type==1)
max=max (r(:,1));
for i=1:10
if(max==(r(i,1)))
disp('your number is:')
disp(i)
end
end
end
%%check the answer:
% i=10 =>number is 0 % i=5 =>number is 5
% i=1 =>number is 1 % i=6 =>number is 6
% i=2 =>number is 2 % i=7 =>number is 7
% i=3 =>number is 3 % i=8 =>number is 8
% i=4 =>number is 4 % i=9 =>number is 9
Lines 45 till 56 shows how we normalize the exported number if our type was "binary
" type (if number is more than 0.5(threshold), change it to 1, else change it to number 0). After that, we send the number to "findNumber
" function and it says what is the number.
if(type==0)
for i=1:4
if(r(i,1)>0.5)
r(i,1)=1;
else
r(i,1)=0;
end
end
FindNumber(r)
end
In lines from 60 to 111, we show the performance for one font (10 numbers from 0 to 9).
disp('performance by test num from 0 to 9 (10 number)')
per=0;
r= newff(num0);
r=ExtractNumber(1,r);
if(r==10)
per=per+1;
end
r= newff(num1);
r=ExtractNumber(1,r);
if(r==1)
per=per+1;
end
r= newff(num2);
r=ExtractNumber(1,r);
if(r==2)
per=per+1;
end
r= newff(num3);
r=ExtractNumber(1,r);
if(r==3)
per=per+1;
end
r= newff(num4);
r=ExtractNumber(1,r);
if(r==4)
per=per+1;
end
r= newff(num5);
r=ExtractNumber(1,r);
if(r==5)
per=per+1;
end
r= newff(num6);
r=ExtractNumber(1,r);
if(r==6)
per=per+1;
end
r= newff(num7);
r=ExtractNumber(1,r);
if(r==7)
per=per+1;
end
r= newff(num8);
r=ExtractNumber(1,r);
if(r==8)
per=per+1;
end
r= newff(num9);
r=ExtractNumber(1,r);
if(r==9)
per=per+1;
end
For finding the performance of our code, we should test all of our datas
so we create a function "ExtractNumber
" for using it for all datas
and we create function "pref
" to find the performance
Pref
tests all input datas
and counts the true and wrong answers and tells the performance of our code.
You can test your own number in this project. Just write the address of your picture of the number in line 29 in the main script, then you can see the answer of the network
In our test, we get the performance more than 70% and if we increase fonts for learning to 200 fonts or more, we will get better performance.
There is a document in the attached source code.
History
- 23rd May, 2017: Initial version