ImageTagger is an application that allows to search images by keywords. It determines contents of images using CeNiN.dll which is a pure C# implementation of deep convolutional neural networks. It is now more than 10 times faster when the Intel MKL libraries are available.
We will write an application that will allow us to search images by keywords. I hate library dependencies or "blackbox"es. So we will not use any 3rd party API or library. Everything will be in pure C# and simple. (With CeNiN v0.2, now it is more than 10 times faster when Intel MKL support is available.)
Introduction
Deep Convolutional Neural Network is one of the hot topics in the image processing community. There are different implementations in various languages. But if you are trying to get the logic behind ideas, large implementations are not always helpful. So I have implemented feed-forward phase of a convolutional neural network in its minimal form as a .NET library; CeNiN.dll.
We will use CeNiN to classify images and tag them with keywords so that we can search an object or scene in a set of images. We will be able to, for instance, search and find images that contain cats, cars or whatever we want, in a folder that we choose.
CeNiN doesn't contain implementation of back-propagation which is required to train a neural network model. We will use a pretrained model. The original model that we will use (imagenet-matconvnet-vgg-f
) and the same model that is in a format compatible with CeNiN can be found here and here respectively. The model contains 19+2 (input and output) layers and 60824256 weights and has been trained for 1000 classes of images...
Preparing the Model
First, we load the model using the constructor. Since it may take a while to load millions of parameters from the model file, we call the constructor in a separate thread not to block the UI:
Thread t = new Thread(() =>
{
try
{
cnn = new CNN("imagenet-matconvnet-vgg-f.cenin");
ddLabel.Invoke((MethodInvoker)delegate ()
{
cbClasses.Items.AddRange(cnn.outputLayer.classes);
dropToStart();
});
}
catch (Exception exp)
{
ddLabel.Invoke((MethodInvoker)delegate ()
{
ddLabel.Text = "Missing model file!";
if (MessageBox.Show(this, "Couldn't find model file.
Do you want to be redirected to download page?", "Missing Model File",
MessageBoxButtons.YesNo,MessageBoxIcon.Error) == DialogResult.Yes)
Process.Start("http://huseyinatasoy.com/y.php?bid=71");
});
}
});
t.Start();
Classifying Images
We need a structure to keep the results:
private struct Match
{
public int ImageIndex { set; get; }
public string Keywords { set; get; }
public float Probability { set; get; }
public string ImageName { set; get; }
public Match(int imageIndex, string keywords, float probability, string imageName)
{
ImageIndex = imageIndex;
Keywords = keywords;
Probability = probability;
ImageName = imageName;
}
}
CeNiN loads layers into memory as a layer chain. The chain is a linked list first and last nodes of which are Input
and Output
layers. To classify an image, the image is set as input and the layers are iterated calling feedNext()
function to feed next layer in each step. When data arrives to Output
layer, it is in a form of probability vector. Calling getDecision()
sorts probabilities from highest to lowest and then we can consider each probability as a Match
. It is important to make those calls inside a thread again not to block the UI. Also, since a thread cannot modify UI elements, codes that modify UI elements (adding new rows to lv_KeywordList
, updating ddLabel.Text
) should be invoked by GUI thread.
Thread t = new Thread(() =>
{
int imCount = imageFullPaths.Length;
for (int j = 0; j < imCount; j++)
{
Bitmap b = (Bitmap)Image.FromFile(imageFullPaths[j]);
ddLabel.Invoke((Action<int,int>)delegate (int y, int n)
{
ddLabel.Text = "Processing [" + (y + 1) + "/" + n + "]...\n\n" +
getImageName(imageFullPaths[y]);
}, j, imCount);
Application.DoEvents();
cnn.inputLayer.setInput(b, Input.ResizingMethod.ZeroPad);
b.Dispose();
Layer currentLayer = cnn.inputLayer;
while (currentLayer.nextLayer != null)
{
currentLayer.feedNext();
currentLayer = currentLayer.nextLayer;
}
Output outputLayer = (Output)currentLayer;
outputLayer.getDecision();
lv_KeywordList.Invoke((MethodInvoker)delegate ()
{
int k = 0;
while (outputLayer.probabilities[k] > 0.05)
{
Match m = new Match(
j,
outputLayer.sortedClasses[k],
(float)Math.Round(outputLayer.probabilities[k], 3),
getImageName(imageFullPaths[j])
);
matches.Add(m);
k++;
}
});
}
lv_KeywordList.Invoke((MethodInvoker)delegate ()
{
groupBox2.Enabled = true;
btnFilter.PerformClick();
int k;
for (k = 0; k < lv_KeywordList.Columns.Count - 1; k++)
if(k!=1)
lv_KeywordList.Columns[k].Width = -2;
lv_KeywordList.Columns[k].Width = -1;
dropToStart();
});
});
t.Start();
Now all the images are tagged with keywords which are actually class descriptions of the model we are using. Finally, we iterate Match
es to find each Match
that contains the keyword written by the user.
float probThresh = (float)numericUpDown1.Value;
string str = cbClasses.Text.ToLower();
lv_KeywordList.Items.Clear();
pictureBox1.Image = null;
List<int> imagesToShow = new List<int>();
int j = 0;
bool stringFilter = (str != "");
for (int i = 0; i < matches.Count; i++)
{
bool cond = (matches[i].Probability >= probThresh);
if (stringFilter)
cond = cond && matches[i].Keywords.Contains(str);
if (cond)
{
addMatchToList(j, matches[i]);
int ind = matches[i].ImageIndex;
if (!imagesToShow.Contains(ind))
imagesToShow.Add(ind);
j++;
}
}
if (lv_KeywordList.Items.Count > 0)
lv_KeywordList.Items[0].Selected = true;
It is that simple!
Training Your Own Models for ImageTagger
You can train your own neural network using a tool like matconvnet and convert it to CeNiN format to use it in ImageTagger
. Here is a matlab script that converts vgg nets to a format compatible with CeNiN:
function vgg2cenin(vggMatFile) % vgg2cenin('imagenet-matconvnet-vgg-f.mat')
fprintf('Loading mat file...\n');
net=load(vggMatFile);
lc=size(net.layers,2);
vggMatFile(find(vggMatFile=='.',1,'last'):end)=[]; % remove extension
f=fopen(strcat(vggMatFile,'.cenin'),'w'); % Open an empty file with the same name
fprintf(f,'CeNiN NEURAL NETWORK FILE'); % Header
fwrite(f,lc,'int'); % Layer count
if(isfield(net.meta,'inputSize'))
s=net.meta.inputSize;
else
s=net.meta.inputs.size(1:3);
end
for i=1:length(s)
fwrite(f,s(i),'int'); % Input dimensions (height, width and number of channels (depth))
end
for i=1:3
fwrite(f,net.meta.normalization.averageImage(i),'single');
end
for i=1:lc % For each layer
l=net.layers{i};
t=l.type;
s=length(t);
fwrite(f,s,'int8'); % String length
fprintf(f,t); % Layer type (string)
fprintf('Writing layer %d (%s)...\n',i,l.type);
if strcmp(t,'conv') % Convolution layers
st=l.stride;
p=l.pad;
% We need 4 padding values for CeNiN (top, bottom, left, right)
% In vgg format if there are one value, all padding values are
% the same and if there are two values, these are for top-bottom
% and left-right paddings.
if size(st,2)<2 , st(2)=st(1); end
if size(p,2)<2 , p(2)=p(1); end
if size(p,2)<3 , p(3:4)=[p(1) p(2)]; end
% Four padding values
fwrite(f,p(1),'int8');
fwrite(f,p(2),'int8');
fwrite(f,p(3),'int8');
fwrite(f,p(4),'int8');
s=size(l.weights{1}); % Dimensions (height, width, number of channels (depth),
number of filters)
for j=1:length(s)
fwrite(f,s(j),'int');
end
% Vertical and horizontal stride values (StrideY and StrideX)
fwrite(f,st(1),'int8');
fwrite(f,st(2),'int8');
% Weight values
% Writing each value one by one takes long time because there are many of them.
% for j=1:numel(l.weights{1})
% fwrite(f,l.weights{1}(j),'single');
% end
% This is faster:
fwrite(f,l.weights{1}(:),'single');
% And biases
% for j=1:numel(l.weights{2})
% fwrite(f,l.weights{2}(j),'single');
% end
fwrite(f,l.weights{2}(:),'single');
elseif strcmp(t,'relu') % ReLu layers
% Layer type ('relu') has been written above. There are no extra
% parameters to be written for this layer..
elseif strcmp(t,'pool') % Pooling layers
st=l.stride;
p=l.pad;
po=l.pool;
if size(st,2)<2 , st(2)=st(1); end
if size(p,2)<2 , p(2)=p(1); end
if size(p,2)<3 , p(3:4)=[p(1) p(2)]; end
if size(po,2)<2 , po(2)=po(1); end
% Four padding values (top, bottom, left, right)
fwrite(f,p(1),'int8');
fwrite(f,p(2),'int8');
fwrite(f,p(3),'int8');
fwrite(f,p(4),'int8');
% Vertical and horizontal pooling values (PoolY and PoolX)
fwrite(f,po(1),'int8');
fwrite(f,po(2),'int8');
% Vertical and horizontal stride values (StrideY and StrideX)
fwrite(f,st(1),'int8');
fwrite(f,st(2),'int8');
elseif strcmp(t,'softmax') % SoftMax layer (this is the last layer)
s=size(net.meta.classes.description,2);
fwrite(f,s,'int'); % Number of classes
for j=1:size(net.meta.classes.description,2) % For each class description
s=size(net.meta.classes.description{j},2);
fwrite(f,s,'int8'); % String length
fprintf(f,'%s',net.meta.classes.description{j}); % Class description (string)
end
end
end
fwrite(f,3,'int8'); % Length of "EOF" as if it is a layer type.
fprintf(f,'EOF'); % And the "EOF" string itself...
fclose(f);
end
Useful Links
History
- 3rd April, 2019: Initial version