|
Try drawKeypoints with an initialized outimage.
frame.copyTo(outImage);
cv::drawKeypoints(frame, keypoints, output);
|
|
|
|
|
I have used your code to create a bag-of-words on ORB features. After the 1st part of the code I store the vocabulary to dictionary.yml. Then again load the vocabulary from file and compute the BOW descriptor of an image from that dictionary. But I get the following entries in the image descriptor file.
%YAML:1.0
img1: !!opencv-matrix
rows: 1
cols: 300
dt: f
data: [ 0., 0., 7.38007389e-003, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 7.38007389e-003, 0., 3.69003695e-003, 0., 0., 0., 0.,
3.69003695e-003, 3.69003695e-003, 0., 1.47601478e-002, 0., 0., 0.,
1.47601478e-002, 3.69003695e-003, 0., 0., 0., 4.05904055e-002, 0.,
0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 3.69003695e-003,
1.47601478e-002, 0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 0.,
0., 0., 0., 0., 0., 0., 1.10701108e-002, 0., 0., 0.,
1.84501857e-002, 3.69003695e-003, 0., 0., 0., 0., 3.69003695e-003,
1.10701108e-002, 1.84501857e-002, 0., 0., 0., 0., 7.38007389e-003,
7.38007389e-003, 0., 0., 3.69003695e-003, 3.69003695e-003, 0., 0.,
3.69003695e-003, 0., 0., 1.47601478e-002, 7.38007389e-003,
3.69003695e-003, 7.38007389e-003, 0., 0., 2.58302577e-002,
3.69003695e-003, 0., 3.69003695e-003, 0., 1.10701108e-002, 0.,
3.69003713e-002, 0., 0., 3.69003695e-003, 0., 0., 0., 0., 0., 0.,
1.10701108e-002, 3.69003695e-003, 0., 7.38007389e-003, 0., 0.,
7.38007389e-003, 0., 1.10701108e-002, 0., 7.74907768e-002, 0., 0.,
0., 0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 0., 0.,
3.69003695e-003, 3.69003695e-003, 3.69003695e-003,
5.16605154e-002, 0., 3.69003695e-003, 3.69003695e-003, 0., 0., 0.,
0., 0., 3.69003695e-003, 0., 1.47601478e-002, 0., 3.69003695e-003,
7.38007389e-003, 0., 0., 0., 3.69003695e-003, 0., 0., 0.,
7.38007389e-003, 7.38007389e-003, 3.69003695e-003, 0., 0., 0., 0.,
7.38007389e-003, 7.38007389e-003, 0., 0., 0., 3.69003695e-003,
7.38007389e-003, 0., 1.84501857e-002, 3.69003695e-003,
2.21402217e-002, 0., 0., 0., 1.10701108e-002, 0., 3.69003695e-003,
0., 0., 0., 3.69003695e-003, 0., 0., 7.38007389e-003, 0., 0., 0.,
0., 0., 0., 3.69003695e-003, 0., 0., 3.69003695e-003,
3.69003695e-003, 0., 0., 3.69003695e-003, 3.69003695e-003, 0., 0.,
0., 0., 7.38007389e-003, 0., 0., 7.38007389e-003, 2.58302577e-002,
0., 0., 0., 7.38007389e-003, 0., 0., 3.69003695e-003,
3.69003695e-003, 0., 0., 3.69003695e-003, 7.38007389e-003,
3.69003695e-003, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
7.38007389e-003, 0., 0., 7.38007389e-003, 0., 0., 3.69003695e-003,
0., 0., 3.69003713e-002, 4.05904055e-002, 0., 0., 3.69003695e-003,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1.10701108e-002, 0., 0., 1.10701108e-002, 0., 3.69003695e-003, 0.,
0., 7.38007389e-003, 0., 3.69003695e-003, 0., 0., 0., 0., 0., 0.,
3.69003695e-003, 0., 0., 0., 3.69003695e-003, 8.85608867e-002, 0.,
0., 0., 0., 0., 3.69003695e-003, 0. ]
In the file here some values are zero and others are fraction value. Is it correct output?
|
|
|
|
|
Hi,
First, thank you very much for the helpful tutorial. I have been following your article and discussions, which are very informative. I have some questions regarding dictionaries and BOF-descriptors that are created by using your code.
Let’s say for CBIR application, I have a dataset of 1 million images and 100 query images, which are picked randomly from within 1 million images dataset.
I tried to generate BOF-SIFT dictionary by following your code. But I am not able to generate a dictionary for more than around 1000 images, because of OUT OF MEMORY error from Opencv.
So, by following your discussions, I want to generate dictionary by using Divide and Conquer method. But, my questions are:
- If I have to generate dictionary from 1 million images, I can divide dataset into 1,000 x 1,000 images and generate 1,000 dictionary files. Later 1,000 BOF-SIFT descriptor files can be generated right?
As I am separately clustering the dataset into 1,000 different partitions, doesn’t it make any performance difference in creating BOF-SIFT descriptors?? (Because, K-means clustering is done in 1,000 different partitions rather than clustering the whole dataset) - If I have to generate BOF-SIFT descriptor for 100 query images, which dictionary I have to use among 1,000 dictionaries?
If the queries are not included in the dataset, then separate dictionary and BOF-SIFT descriptors for queries can be created right?
Thank you in advance and I will be eagerly looking forward to your reply .
|
|
|
|
|
Thank you for sharing the code. It has been extremely helpful to me.
I modified your code to implement objects classification. Currently I have two sets of images as my database, of which 52 images are colored squares (from different viewing aspects) and the other 78 images are beverage cans. I created the dictionary using the 130 images. After extracting the BOF descriptors, I used SVM to do the classification. When testing the classifier with my local database, the outcome was acceptable. However, there are few (6 in total) failures.
I wanna ask you the following questions:
1. In your code:
//I select 20 (1000/50) images from 1000 images to extract
//feature descriptors and build the vocabulary
Why select only 20 images out of 1000 instead using them all? Is it for efficiency? But how can you be sure that amount would be enough?
2.When you apply K-MEANS:
int dictionarySize=200;
Why set the size to 200? How do you decide the size? I find this size affecting the classification result but don't understand how important this parameter could be.
Later I'm planning to add more classes such as books into the model, so I will have to improve the accuracy of my classifier. Thanks again in advance.
|
|
|
|
|
For the first question, it is better to use all 1000 images than using 20. I used 20 just because I need to create the vocabulary in a short time. Extracting SIFT features from 1000 images takes a lot of time. Anyway it is always better to use as many as images you can find with in the domain.
For the second question, I selected 200 as the dictionary size because I have referred a research paper which included an extensive experiment to find the best size of the dictionary. Check A SIFT-LBP IMAGE RETRIEVAL MODEL BASED ON BAG-OF-FEATURES.
But it is your task to find the best number of bags that suits to the domain of application. More the number of bags means more discriminative power and high computational complexity. But even for the improvements in discriminative power has a limit. Therefore you may have to search for the best number of bags exhaustively.
|
|
|
|
|
What dataset of images have you used to train your code ?
|
|
|
|
|
In my work I have used Corel dataset. But I have not included the trained dictionary file in the zip. You can use whatever the dataset you have.
|
|
|
|
|
So after computing descriptor how we can use this for object detection? or how can we measure the accuracy of the detection?
e.g. How I can use this code to say if a image is bike or not (when I made my dictionary on bikes)?
modified 21-Apr-15 19:48pm.
|
|
|
|
|
You will have to use a classifier such as support vector machine (SVM) or neural network (NN). Look them out.
|
|
|
|
|
I know that, any good example available using opencv?
|
|
|
|
|
|
Hi Mr Bandara,
Thanks for this article/tutorial.This is a really nice article that helped me a lot in understanding BoW. I have tried your code and it worked nicely for object recognition like bikes, airplanes, faces and cars. However, i believe that the SIFT extracted here are not the dense sampling strategy (correct me if i'm wrong). It is somewhat the strategy from the original paper. What I would like to ask is, How do I extract dense SIFT features with this code ?
Thanks in advance
|
|
|
|
|
Hi,
Yes this is not a dense descriptor as it first detect salient key points. Dense SIFT feature means that we are calculating the SIFT descriptor for each and every pixel or predefined grid of patches in the image. So you have to give dense feature points (instead of what you get from the .detect() function) when you call the siftExtractorObject.compute() function. If you are using OpenCV 2.4.6 or later then you can use FeatureDetector::create(const string& detectorType) with the string "Dense" . See the documentation here: DenseFeatureDetector
|
|
|
|
|
Thank you for your reply. I have managed to extract dense SIFT like this for the first part:
{ dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); dense_SIFT_BoW(img_raw, featuresUnclustered); count++;
}
The function dense_SIFT_BoW() is like this:
void dense_SIFT_BoW(Mat img_raw,Mat &featuresUnclustered)
{
Mat descriptors; vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f, 1, 0.1f, 10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
descriptorExtractor->compute(img_raw,keypoints,descriptors);
descriptors.setTo(0, descriptors < 0);
descriptors = descriptors.reshape(0,1);
featuresUnclustered.push_back(descriptors);
}
In the first part, everything works well. I get a very large .xml file (76800 x 256). 76800 corresponds to the dense SIFT descriptors for each image, and 256 the number of images(no of image in a folder).
The problem is the second part when we we want to do nearest neighbor matching etc. I have done something like this:
Mat dictionary;
FileStorage fs("D:/WillowActions Backup/Willow 300_200/Dic_Dense.xml", FileStorage::READ);
fs["Vocabulary"] >> dictionary;
fs.release();
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
.
.
.
Mat img_raw; Mat bowTry;
int count = 0;
.
.
.
{
dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); SIFT_matcher(img_raw, dictionary, bowTry, matcher);
count++;
}
The function SIFT_matcher() goes like this:
void SIFT_matcher(Mat img_raw, Mat &dictionary, Mat &bowTry, Ptr<DescriptorMatcher> &matcher)
{
Mat descriptors;
vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f,1,0.1f,10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
BOWImgDescriptorExtractor bowDE(descriptorExtractor, matcher);
bowDE.setVocabulary(dictionary);
Mat bowDescriptor;
bowDE.compute(img_raw,keypoints,bowDescriptor);
bowTry.push_back(bowDescriptor);
bowDescriptor.release();
}
When running the code for this second part, the program simply crashes. Is there something that you find odd in my code that probably i've overlooked. If you need the whole code feel free to let me know.
|
|
|
|
|
May I know how it crashes exactly? Does it show any error message in the output window or in the console?
|
|
|
|
|
Hi,
That is the main problem I'm facing. It does not show any error either in the console or the compiler. The program just crashes with the usual "programX.exe has stopped working" window displaying. I am suspecting that this has something to do with memory leak or something, I'm not sure. Right now, this is the only problem that is hindering me. One thing i know is that when I comment out these...
BOWImgDescriptorExtractor bowDE(descriptorExtractor, matcher);
bowDE.setVocabulary(dictionary);
.
.
.
bowDE.compute(img_raw,keypoints,bowDescriptor);
The code runs fine. But by commenting this out, obviously the code does not do what its intended to do. This means that there is probably something that i'm overlooking on these lines or related to these lines. Some alternatives or suggestion would be helpful. Thanks in advance...
|
|
|
|
|
Any suggestions ? this issue is still un-resolved. Running trough a debuggers tells me that this part of the code
bowDE.compute(img_raw,keypoints,bowDescriptor);
produces a segmentation fault. thanks...
|
|
|
|
|
Can I see the complete code? It is hard point out an exact reason when it comes to segmentation fault error with out closely looking at the code. It might be a simple initialization problem, a simple problem in the keypoint vector or sometime a problem in a loop. If you can provide me the code, I might be able to tell you the exact reason and the solution.
Further, since you are working with a dense descriptor, why don't you go with the HOG descriptor. It is almost same as the dense SIFT.
|
|
|
|
|
Thanks for your reply,
Yes, I can provide the whole code. Here is the whole code that I'm using.
#include<iostream>
#include<vector>
#include<dirent.h>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/features2d/features2d.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/nonfree/nonfree.hpp>
using namespace std;
using namespace cv;
void dense_SIFT_BoW(Mat img_raw, Mat &featuresUnclustered);
void SIFT_matcher(Mat img_raw, Mat &dictionary, Mat &bowTry, Ptr<DescriptorMatcher> &matcher);
#define DICTIONARY_BUILD 0
int main()
{
#if DICTIONARY_BUILD == 1
initModule_nonfree();
Mat featuresUnclustered;
DIR *pDir = nullptr;
string imgDir("D:/WillowActions Backup/Willow 300_200/sample/"); string imgPath;
pDir = opendir(imgDir.c_str());
string dirName;
Mat img_raw;
struct dirent *pent = nullptr;
if(pDir == nullptr)
{
cout << "Directory pointer could not be initialized correctly ! " << endl; return 1;
}
int count = 0;
cout << "Please WAIT..." << endl;
while((pent = readdir(pDir)) != nullptr)
{
if(pent == nullptr)
{
cout << " Dirent struct could not be initialized correctly !" << endl;
return 1;
}
if(!strcmp(pent->d_name,".")||!strcmp(pent->d_name,".."))
{
}
else
{
dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); dense_SIFT_BoW(img_raw, featuresUnclustered);
count++;
}
}
cout << "Number of files in folder: " << count << endl;
cout << featuresUnclustered.size() << endl;
int dictionarySize = 5;
TermCriteria tc(CV_TERMCRIT_ITER, 100, 0.001);
int retries = 1;
int flags = KMEANS_PP_CENTERS;
BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
Mat dictionary = bowTrainer.cluster(featuresUnclustered);
FileStorage fs("D:/WillowActions Backup/Willow 300_200/Dense.xml", FileStorage::WRITE);
fs << "Vocabulary" << dictionary;
fs.release();
#else
Mat dictionary;
FileStorage fs("D:/WillowActions Backup/Willow 300_200/Dic_Dense.xml", FileStorage::READ);
fs["Vocabulary"] >> dictionary;
fs.release();
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
cout << dictionary.size() << endl;
DIR *pDir = nullptr;
string imgDir("D:/WillowActions Backup/Willow 300_200/sample/"); string imgPath;
pDir = opendir(imgDir.c_str());
string dirName;
Mat img_raw; Mat bowTry;
struct dirent *pent = nullptr;
if(pDir == nullptr)
{
cout << "Directory pointer could not be initialized correctly ! " << endl; return 1;
}
int count = 0;
cout << "Please WAIT..." << endl;
while((pent = readdir(pDir)) != nullptr)
{
if(pent == nullptr)
{
cout << " Dirent struct could not be initialized correctly !" << endl;
return 1;
}
if(!strcmp(pent->d_name,".")||!strcmp(pent->d_name,".."))
{
}
else
{
dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); SIFT_matcher(img_raw, dictionary, bowTry, matcher);
cout << count << endl;
count++;
}
}
cout << "no of images: " << count << endl;
FileStorage fs1("D:/WillowActions Backup/Willow 300_200/Features_train.xml", FileStorage::WRITE);
fs1 << "Vocabulary" << bowTry;
fs1.release();
#endif
waitKey(0);
return 0;
}
void SIFT_matcher(Mat img_raw, Mat &dictionary, Mat &bowTry, Ptr<DescriptorMatcher> &matcher)
{
Mat descriptors;
vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f,1,0.1f,10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
BOWImgDescriptorExtractor bowDE(descriptorExtractor, matcher);
bowDE.setVocabulary(dictionary);
Mat bowDescriptor;
bowDE.compute(img_raw,keypoints,bowDescriptor);
bowTry.push_back(bowDescriptor);
bowDescriptor.release();
}
void dense_SIFT_BoW(Mat img_raw,Mat &featuresUnclustered)
{
Mat descriptors; vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f, 1, 0.1f, 10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
descriptorExtractor->compute(img_raw,keypoints,descriptors);
descriptors.setTo(0, descriptors < 0);
descriptors = descriptors.reshape(0,1);
featuresUnclustered.push_back(descriptors);
}
That is the code that i'm currently using. I use the "dirent" struct to loop in for images in folders instead.
Thanks for your suggestion for the HOG descriptor. Actually, I have used HOG descriptors without bag of features. My purpose is actually to compare the results for SIFT and Dense SIFT bag of features. And also compare these features with HOG, ORB etc. But right now, i am having trouble with this part.
|
|
|
|
|
The problem is in the reshaping. Each of you dense descriptors is appeared as a one single descriptor. The dimension of a one particular descriptor is 1X12288. But actually it should be something like number_of_keypointsx128. Here since it is a dense descriptor the number of keypoints is actually the number of pixels or the number of cross points in the grid that you are defining.
Currently your vocabulary is 5x12288 which means you have 5 visual words. But here the visual word is not actually a SIFT descriptor. Instead it is a collection of SIFT descriptors found with a dense keypoints.
The reason to the error is that, although the words in your vocabulary has the length of 12288, by calling the bowDE.compute you are asking to find the best matched word (size:12288) from the vocabulary with a set of points which are in actually 128 dimensional space. So 128 to 12288 is not matching. The problem is actually in your vocabulary. Do not reshape the Mat with the dimension LX128 to 1X(Lx128).
|
|
|
|
|
Thank you very much for you time in looking into my code. I think i understand my mistake now. I will try what you suggested. Thanks again.
|
|
|
|
|
Nice to hear that. Let us know the results.
|
|
|
|
|
hi dear how can i show this bowdiscriptor in opencv c++
and how can i classify my picture with this bowdiscreptor
|
|
|
|
|
Generally we do not need to show an image descriptor. However if you want to visualise then you can use the descriptor values to draw cvLine. The ultimate image will be a histogram.
You can use support vector machine (SVM) trained with bow descriptor to classify an image. Other than SVM there ar lot classification algorithms available which can be used to classify this kind of descriptor. It is also possible to measure the similarity of two images by using a kernel of the two histograms.
|
|
|
|
|
can you please tell me how to use this code,
I first build it for DICTIONARY_BUILD=1 and then run it.
Next I build it for DICTIONARY_BUILD=0
and run it again
but it crash and does not work.
this line does not work fine
bowDE.setVocabulary(dictionary);
modified 21-Apr-15 2:47am.
|
|
|
|
|