Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Automatic Linguistic Indexing of Pictures (ALIP) By Artificial Neural Network Approach

5.00/5 (50 votes)
10 Sep 2009GPL314 min read 275.2K   13.4K  
General idea of how the computer may be used to describe the image analyzing its pixel content known as ALIP
The article describes how computers may be used in intelligent annotation of the audio, video or image media data content with peculiar phenomenon arising from such novel field that can be coined as 'AI-xenophobia' or 'Cyber-xenophobia' or 'Cyborg-xenophobia'?

kesha

Contents

Introduction

While I was coding some AI application, I heard some mellow strains of a childish songstress coming from upstairs of the neighbours which they played repeatedly. It was sometimes hardly audible to catch the verses, but I managed to distinguish several characteristic phrases to have a look over some great web search engine (I like it, since it puts some of my CodeProject code articles to first 1-2 pages of the search results). The only significant phrase from the song I submitted to the engine was (to prevent undue advertisment), say "фиолетовая паста" (violet paste). I expected it would have given scores of make up advertisments, but contrary-wise, just one link from the first page of the search results among cosmetic industry spam pointed to some music web forum with exactly that phrase from the rhymes. The next click of mouse and second search over that engine gave me music group verses of the song, guitar tabs and put me to Youtube so I was listening to that marvelous music clip.

It is astounding how a person with permanent internet access can in few seconds, after having heard the music, be presented with the verses, group information and video clip to listen to. The process is described as searching on the media data content. As current web searches use textual information to return results, consider you will be able to give it as a search query either audio, video or image sample the same way you submit your textual requests. Just as the computer was listening to some music, it was able to present you the same information.

The concept known as Connected Visual Computing (CVC) is actively pursued by Intel. The CVC concerns the media data processing, e.g., when in the field of view of your mobile phone cam emerges some object (ant for example) you can see on the screen its identification obtained by mobile analyzed its image, that it is say Camponotus herculeanus, or when you see some caption in the street on unknown language, you may view it through your mobile cam and it will display at the same location in the street the same caption but in your native language (augmented reality (AR), 2D/3D overlays), or the above presented example by the search using audio content. The market promises immense propagation. That introduced market will for the very long period of time keep the audience consuming modern hardware and software.

Here, I'd like to present the general idea on how the computer may be used to describe the image analyzing its pixel content known as the Automatic Linguistic Indexing of Pictures (ALIP). The approach is general and is always assumed to extract some descriptive features from the data and to use some rules to attribute the content to some category.

If you're interested in the immediate applications, you may contact the supporting firm System7 of the content based image recognition (CBIR) part of the project.

Background

Basic understanding of AI approaches, e.g., neural networks, support vector machines, nearest neighbour classifiers. Image descriptive and transform methods as wavelets, edge extraction, image statistics, histograms. C++/C# experience as in this article you will find how to invoke C++ DLL methods from within C# application.

Using the Application

In my ALIP experiment, I decided to annotate the simple natural image categories. There are 5 ANN classifiers in the project corresponding to:

  • Pictures that might contain animals
  • Pictures that might contain flowers
  • Pictures that might contain landscapes
  • Pictures that might contain sunsets
  • Others pictures that do not contain the above categories or simply unknown image type

You need to use unknown category along with the others you'd like to classify to. As otherwise AI classifier would be able to identify only e.g., animals, flowers, landscapes, sunsets with every image you give. But in the real world, there are other types of images that do not fall into either of the above presented categories, so you will need to meddle with AI classification thresholds which is rather cumbersome and awkward. But having additional unknown category AI classifier the results of the image identification will be as either one of the known image categories or simply unknown image type the computer can not identify using its petty knowledge.

I adore the image databases, they contain shots from all over the world really nice to observe. I've got about 20000 images for designers bought from a DVD shop. I've taken image samples from the animals, flowers, landscapes, sunsets image types and added all other image categories that do not come from the 4 ones to have unknown image type.

Now the usage of the program is simple enough. Just run the alip.exe and it will load all necessary AI classifiers files (in case of error, you will have a message box and will not be able to use it). Then click the [...] button and select the directory that presumably contains some *.jpg files. You may use the ones supported in this demo under pics directory. All the found files will be added to the list box, then just click them to watch in the right panel and see the proposed category in the top left panel. In theory, it should be able to comment the image as presented below:

Mi gato esta parado en el suelo

Methodology

Due to the competing interests with the former organizations and the current one I work for, I will not be able to describe in minute details the methodology and feature extraction methods. I would rather present the general trend and categories of the features used for description of images. As searching over internet for corresponding feature computation will reveal all the necessary papers with particular formulae.

There are some demos available online, e.g., ALIPr. They use hidden markov models HMMs and wavelet features from the images. You may try the pictures from that article using their methods or vice versa my application with their pictures and compare the annotation results.

As the AI approach is general and assumes some reduction of the original data dimensionality using either features extraction or PCA transform or both, all that is needed is to collect some data, extract the features and train AI classifiers. If you understand my face detection articles, you will be able to repeat the experiment:

After you converted your raw image data to the features, just train some AI classifiers to discriminate desired positive category from negative ones.

ALIP Features

Generally, they are divided into:

  • Color features
  • Texture features
  • Shape features

The Color features are simply the original raw image data, histogram of the image channels, image profile. Texture features are the known edge extraction methods, wavelet transforms, image statistics (e.g., 1st order: mean, std, skew; 2nd order: contrast, correlation, entropy...). And Shape features try to estimate the object shapes found in the images. Just have a look at wiki for CBIR.

Typically, the original image color space RGB is transformed to alternative spaces as YCbCr, HSV, HSI, CIEXYZ, etc. As alternative spaces might give better discrimination of the data, but you need to experiment with them anyway.

Source Code Tips

The point worth mentioning here is the interaction from the C# application with C/C++ code in dll. As it leads to efficient way of coding the great GUI yet retaining the advantages of C/C++ native code.

Just create the simple C++ DLL with some exported function:

C++
Alip alip;

ALIP_API int alipClassify(const double* data, double* results, unsigned int* indices)
{
        return alip.classify(data, results, indices);
}

In C# application, declare the functions in the class you will be calling from the DLL:

[DllImport("alip")]
static extern unsafe int alipClassify(double* data, double* results, uint* indices);

Switch on the /unsafe code switch in application settings. Then using the fixed C# statement you may create the pointers to C# variables and pass them to C++ DLL:

double[] results = new double[this.aiClassifiers.Count];
uint[] indices = new uint[this.aiClassifiers.Count];

fixed (double* pdata = cbir.CbirEntries[0].features.Features)
fixed (double* presults = results)
fixed (uint* pindices = indices)
{
        int res = alipClassify(pdata, presults, pindices);
        if (res != 0)                                                        
                throw new Exception(String.Format("alipClassify() returned {0}", res));
        
}

ALIP Results

I deliberately selected the most simple image features, that do not look like features at all, due to competing interests with the former funding organization System7. I used just image itself, downscaled it to 16x16 and converted to YCbCr colorspace. Obviously, that is not the proper feature to start with, as others would significantly outperform it in discrimination ability. However, though I anticipated the classification would be completely incorrect, to my great surprise it performed pretty well, producing quite precise results. Then consider the annotation quality had you used combination of color and texture features (e.g. histograms, statistics, entropy, etc.).

You may estimate the quality of the other feature types on cbir.system7.com demo. It just returns images that are close to the query one using some linear or non-linear distance metric. So it acts as some kind of kNN classifier, you just annotate the image type basing on the majority of the first several best matches returned, or in any other way combining the annotation.

For annotation, I selected the five image categories:

  • animals - 900 pictures
  • flowers - 1100 pictures
  • landscapes - 1200 pictures
  • sunsets - 700 pictures
  • unknown - 1600 pictures of types than the above four

By all means, there is interconnection between the categories, as flowers or animals pictures may be shot in landscape like surrounding, sunsets may also be the shots of the landscapes, also some unknown pictures may contain one of the above four categories.

The single image feature vector is quite high dimensional as 16x16x3 = 768D. So I performed PCA dimensionality reduction to 70D space. The 70 eigenpictures contain 90% of variance retained. The eigenimages are presented as pca.nn file. And the first 60 eigen vectors for separate colorspace channels are presented below:

Y

Cr

Cb

They look pretty similar to the ones from my PCA based Face Detection article, which is attributed to the analysis of the natural image scenes.

Then having 70D data, I used first half of the image categories for training AI classifiers and the rest halves for estimating classification accuracy. I opted for ANN classifiers with 70-20-1 structure, so there are 5 trained ANNs at all, every one is trained to separate its image category from all the others. The small number of hidden neurons and just 1 hidden layer will keep the ANN from overfilling the data.

The train part showed 8% error for classifying unknown image into one of the 4 known image categories (false positive rate), and 4% error for classifying one of the 4 known image categories into unknown (false negative rate). The test part showed worse results as 45% of false positive error rate and 20% of false negative rate.

They seem to be quite inaccurate on the test part, however this might be caused by the noise, as in unknown category there might be some images from known category and vise versa. I never trusted image database composers, and looking at 1000 images to deselect the wrong ones, might lead that after 5 minutes of work you may forget about the image category you're working with. The better way of course is the cbir.system7.com application. You just give it the desired image category sample image, e.g., with flowers, and it will return you the most closer images say from 1000000+ image database. Have a bash to do that manually.

But to the worse test images error rate also accounts the simplicity of the image features by all means.

Below, I present the annotation results from the test part only to be fair. As there might be several ANNs with high outputs, some shots contain annotation of more than one image type, e.g., animals in the landscape surroundings.

Animals Category

animals

animals

animals

Actually annotated as landscape, but at 16x16 resolution, it looks like that category. Remember about worse error rates and 'noise' in the image categories.

animals

animals

animals

That one is better, animals in the landscape like surrounding.

animals

animals

animals

animals

animals

animals

animals

animals

animals

animals

animals

Flowers Category

Flowers annotations are quite good also. It reveals landscape annotations in addition to flowers, as some images are quite similar to landscapes. There is also spurious animals group added sometimes.

flowers

flowers

flowers

flowers

flowers

flowers

flowers

flowers

flowers

flowers

flowers

flowers

Landscapes Category

Here are the few shots of landscapes annotated as unknown category due to high negative error rate. Otherwise annotation is reasonable, revealing also additional category as sunsets added to landscape view in the evening.

landscapes

landscapes

landscapes

The landscape in the sunset. Adroit AI annotation.

landscapes

landscapes

landscapes

Sunsets Category

Obviously, sunsets is the most simple picture type. Besides several unknown annotations, there are landscapes and some flowers during sunsets annotations. Well, AI never 'has been' taught to identify trees or palms, so it generalizes them to flowers. Otherwise, very good results.

sunsets

Landscape with a sunset.

sunsets

'Flowers' in the sunset:

sunsets

Landscape like picture, sunset behind mountain ridge, very romantic.

sunsets

The 'flowers' in the sunset.

sunsets

sunsets

sunsets

sunsets

Unknown sunset pictures.

sunsets

sunsets

sunsets

Another bunch of 'flowers' in the sunset.

sunsets

The next two, are these landscape in the sunset or sunset in the landscape?

sunsets

sunsets

'Flowers' again in the sunset of a landscape.

sunsets

Very thin 'flowers' in the sunset.

sunsets

Londres?

sunsets

Unknown Category

The unknown category showed about 43% of error on the test set, but there might be two possibilities to that percentage. Either the ANN failed to generalize well, showing much better performance on train set, or it might be due to the noise in the data set, e.g., incorrect measurements attributed to the unknown category while they are actually from others, e.g., sunsets, landscapes.

The test results rather prove to the benefit of AI than for the accuracy of human image categorization. Having few dozens of unknown pictures from the test set presented below, only few of them might be attributed to the pure unknown category. Others contain the scenes from landscapes, sunsets, animals categories, which were correctly identified by AI.

That one is fleshy and succulent.

Image 58

Image 59

Image 60

Image 61

Image 62

The sunset from unknown category.

sunsets

The landscape generalization.

landscape

Image 65

The sunset in the unknown category. La pareja va a abrazar.

sunsets

landscapes

The sunset again. La pareja se esta abrazando.

sunsets

sunsets

landscapes

landscapes

landscapes

The animals.

animals

animals

animals

Landscapes.

landscapes

landscapes

Flowers like image?

flowers

Looks like a sunset with flowers.

flowers, sunsets

flowers, sunsets

landscapes

unknown category

Here, one may agree with AI.

unknown category

Image 84

Live flowers, as in 'Alice in wonderland'. Better generalization.

flowers

AI-xenophobia?

The rest of the unknown samples annotated by AI pertaining to other category are rather controversial and defiant, as it tends to annotate the humans on the pictures as animals, what impertinence. The results can be attributed to:

  • AI generalization of the learned objects (e.g., trees identified as flowers)
  • AI proclamation of his superior intelligence over ordinary human being who he considers as animal species
  • AI gross error on the test set

The first scenario is pretty likely to occur, as AI already showed his capacity to generalize the similar objects to the only categories known to him, as in the case he annotated trees as flowers. The last is less probable, as the scenes are not quite different from the learned categories, so the greater false positive error is rather attributes to the benefit of AI generalization acumen.

Well, the second case might also be possible. It seems even more dramatic to the benefit of science fiction writers, who forbode, that once computers will gain control, they would either exterminate the humans or subdue them to zoo, as we have done with the 'real' animals (e.g. I Robot, Terminator 3), as only AI revolution might save the human being from self-extermination from AI point of view.

I presume also, that, the second scenario might be the telling example to the benefit of Darwin theory, that humans evolved from the animals, as even dozen neurons of a simple AI understood that, while some persistent human beings try to disprove the obvious facts.

I looked over Google for the term that might be applicable to such newly revealed phenomena. AI-xenophobia showed about 5 links only to some blog, the cyber-xenophobia is already coined to be the phenomena widely used by Japanese, or cyborg-xenophobia which does not reveal any links, but it is rather restricted to robo beings and not to general AI intelligence. Without discussing the already used terms in more details, all of them describe the actions of the humans in the cyberspace, and not AI against the human.

Who knows, that might be the first manifestation of the presumptuous AI action against human by taunting at first. Beware yourself.

Anyway, the results are shown below. I'm just presenting the AI understanding of the image content. Please forbear from taking his incentives too serious and do not cane me.

Image 86

Image 87

Image 88

Might be he is proclaiming, beware, the AI is callous.

Image 89

Someone may agree with the below examples of AI understanding.

Image 90

Image 91

Image 92

Here AI is right at one point at least, landscape!

Image 93

Image 94

As the final words, try different features and combinations yourself, you might then be able to teach AI to respect humans, or simply add another category as images with humans.

At least AI indicates some reverence to his creator, as not putting me to animals.

Image 95

Try him on images of yours.

History

  • 18th December, 2008: Initial version

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)