|
Hi,
do you extract the vocabulary of words from images of all the object categories of your dataset, or each object category should have its own vocabulary?
|
|
|
|
|
Vocabulary is common to the problem, not to the category. For an example, English language has a vocabulary, and an article about politics uses words from that vocabulary. We simply count the number of occurrences of words for different categories (i.e. politics, science, aesthetic, history, whether etc.)
|
|
|
|
|
Hi
How can I calculate the distance between the center of clusters obtained from Step1?
|
|
|
|
|
Distance calculation is automatically done in the BOWImgDescriptorExtractor class. However we can use Flann based matcher to rapidly find the nearest neighbors with specifying the distance calculation method.
|
|
|
|
|
Hi
I used your code for feature extraction of head tracking.
For the first step I set the image path for building dictionary to the folder containing all faces(10person*185frames), In the second step I set the path for one of dataset folders(1person=185frames). Both steps goes well(without error) except that The output of 2nd step for all frames of one person are the same vector(1*9 vector)! Is it true? So all my features for frames are the same!
|
|
|
|
|
Hi dear Ravimal Bandara
I used your code and I save the final dictionary.yml in 2nd part with this line :
FileStorage fs1("descriptor2.yml", FileStorage::APPEND);
I used APPEND because I want to save all outputs in 1 file. Dictionary.yml looks like this :
...
---
img1: !!opencv-matrix
rows: 1
cols: 9
dt: f
data: [ 8.96017700e-02, 2.14601770e-01, 9.40265507e-02,
1.02876104e-01, 1.00663714e-01, 1.01769909e-01, 6.74778745e-02,
9.62389410e-02, 1.32743359e-01 ]
...
---
img1: !!opencv-matrix
rows: 1
cols: 9
dt: f
data: [ 8.96017700e-02, 2.14601770e-01, 9.40265507e-02,
1.02876104e-01, 1.00663714e-01, 1.01769909e-01, 6.74778745e-02,
9.62389410e-02, 1.32743359e-01 ]
...
---
img1: !!opencv-matrix
rows: 1
cols: 9
dt: f
data: [ 8.96017700e-02, 2.12389380e-01, 9.51327458e-02,
1.02876104e-01, 1.01769909e-01, 1.01769909e-01, 6.74778745e-02,
9.62389410e-02, 1.32743359e-01 ]
But I just want the numbers and delete the others. for example:
[ 8.96017700e-02, 2.14601770e-01, 9.40265507e-02,1.02876104e-01, 1.00663714e-01, 1.01769909e-01, 6.74778745e-02,9.62389410e-02, 1.32743359e-01 ]
[ 8.96017700e-02, 2.14601770e-01, 9.40265507e-02,1.02876104e-01, 1.00663714e-01, 1.01769909e-01, 6.74778745e-02,9.62389410e-02, 1.32743359e-01 ]
and etc.
how can I do this?Can I save these numbers to a .csv file?(a .CSV file with #image rows*9colomns)
|
|
|
|
|
Writing in a yml file on append mode may make the file unreadable due to the corrupted format. If you need a csv then simply open a regular file and use formatted file print function.
FILE * pfile = fopen("myFile.csv","a");
for(int c=0;c<descriptor.cols;c++)
{
if(c!=(descriptor.cols-1))
fprintf(pfile,"%f,",descriptor.at<float>(0,c));
else
fprintf(pfile,"%f\n",descriptor.at<float>(0,c));
}
fclose (pfile);
|
|
|
|
|
I do not understand.
This is my code :
#include "opencv2/core/core.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/nonfree/features2d.hpp"
#include "opencv2/nonfree/nonfree.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace std;
using namespace cv;
Mat dictionary;
SiftDescriptorExtractor Detector;
VideoCapture cap;
Mat img;
int main(int argc, char*argv[])
{
for(int i=0;i<185;i++)
{
FileStorage fs("/media/rahim/01D051E78FFCF7C0/Lab/Imageproject/BOW1out/dictionary1.yml", FileStorage::READ);
fs["vocabulary"] >> dictionary;
fs.release();
//create a nearest neighbor matcher
Ptr<descriptormatcher> matcher(new FlannBasedMatcher);
Ptr<featuredetector> detector(new SiftFeatureDetector());
Ptr<descriptorextractor> extractor(new SiftDescriptorExtractor);
//create BoF (or BoW) descriptor extractor
BOWImgDescriptorExtractor bowDE(extractor,matcher);
//Set the dictionary with the vocabulary we created in the first step
bowDE.setVocabulary(dictionary);
char * filename = new char[100];
char * imageTag = new char[10];
FileStorage fs1("descriptor2.yml", FileStorage::APPEND);
cap=VideoCapture("/media/rahim/01D051E78FFCF7C0/Lab/Dataset/EyeHead/HPEG/HPEG/session_a/videos/frame-s-a/10/%05d.jpg");
//read the image
cap >> img;
//Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);
vector<keypoint> keypoints;
Detector.detect(img,keypoints);
Mat bowDescriptor;
bowDE.compute(img,keypoints,bowDescriptor);
sprintf(imageTag,"img1");
fs1 << imageTag << bowDescriptor;
fs1.release();
cout << "\n image number: " << i;
}
}
How should I use the code you said?
|
|
|
|
|
Use after
bowDE.compute(img,keypoints,bowDescriptor);
The code should be,
FILE * pfile = fopen("myFile.csv","a");
for(int c=0;c<bowDescriptor.cols;c++)
{
if(c!=(bowDescriptor.cols-1))
fprintf(pfile,"%f,",bowDescriptor.at<float>(0,c));
else
fprintf(pfile,"%f\n",bowDescriptor.at<float>(0,c));
}
bowDescriptor is what the image descriptor holds. It contains the normalized histogram of features (the bag of feature histogram). So the dimension of the content is 1 X dictionary_size . In my code it is 1 x 200
|
|
|
|
|
It works.Thanks so much
|
|
|
|
|
I used your code :
#include "opencv2/core/core.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/nonfree/features2d.hpp"
#include "opencv2/nonfree/nonfree.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace std;
using namespace cv;
char * filename = new char[100];
Mat featuresUnclustered;
Mat descriptor;
Mat input;
SiftDescriptorExtractor detector;
int main( int argc, char** argv )
{
cv::initModule_nonfree();
cv::Mat output;
VideoCapture cap;
cap.open("/media/rahim/01D051E78FFCF7C0/Lab/Dataset/EyeHead/HPEG/HPEG/session_a/videos/faces/%05d.jpg");
for(int i=0;i<1835;i++)
{
if(i%1835==0)cap=VideoCapture("/media/rahim/01D051E78FFCF7C0/Lab/Dataset/EyeHead/HPEG/HPEG/session_a/videos/faces/%05d.jpg");//there are 335 frames in the dir
Mat frame;
std::vector<cv::keypoint> keypoints;
cap >> frame;
imshow("frame",frame);
detector.detect(frame, keypoints);
cv::drawKeypoints(frame, keypoints, output);
char *imageName = "image";
char *jpgN = ".jpg";
stringstream s;
s << imageName << i << jpgN;
cv::imwrite (s.str(), output);
detector.compute(frame, keypoints,descriptor);
featuresUnclustered.push_back(descriptor);
if(waitKey(1)==27)
exit(0);
}
//Construct BOWKMeansTrainer
//the number of bags
printf("hello");
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
Mat dictionary=bowTrainer.cluster(featuresUnclustered);
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;
fs.release();
return 0;
}
but this error came,what should i do?
OpenCV Error: Assertion failed (!outImage.empty()) in drawKeypoints, file /home/rahim/Desktop/OpenCV/opencv-2.4.10/modules/features2d/src/draw.cpp, line 115
terminate called after throwing an instance of 'cv::Exception'
what(): /home/rahim/Desktop/OpenCV/opencv-2.4.10/modules/features2d/src/draw.cpp:115: error: (-215) !outImage.empty() in function drawKeypoints
|
|
|
|
|
Try drawKeypoints with an initialized outimage.
frame.copyTo(outImage);
cv::drawKeypoints(frame, keypoints, output);
|
|
|
|
|
I have used your code to create a bag-of-words on ORB features. After the 1st part of the code I store the vocabulary to dictionary.yml. Then again load the vocabulary from file and compute the BOW descriptor of an image from that dictionary. But I get the following entries in the image descriptor file.
%YAML:1.0
img1: !!opencv-matrix
rows: 1
cols: 300
dt: f
data: [ 0., 0., 7.38007389e-003, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 7.38007389e-003, 0., 3.69003695e-003, 0., 0., 0., 0.,
3.69003695e-003, 3.69003695e-003, 0., 1.47601478e-002, 0., 0., 0.,
1.47601478e-002, 3.69003695e-003, 0., 0., 0., 4.05904055e-002, 0.,
0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 3.69003695e-003,
1.47601478e-002, 0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 0.,
0., 0., 0., 0., 0., 0., 1.10701108e-002, 0., 0., 0.,
1.84501857e-002, 3.69003695e-003, 0., 0., 0., 0., 3.69003695e-003,
1.10701108e-002, 1.84501857e-002, 0., 0., 0., 0., 7.38007389e-003,
7.38007389e-003, 0., 0., 3.69003695e-003, 3.69003695e-003, 0., 0.,
3.69003695e-003, 0., 0., 1.47601478e-002, 7.38007389e-003,
3.69003695e-003, 7.38007389e-003, 0., 0., 2.58302577e-002,
3.69003695e-003, 0., 3.69003695e-003, 0., 1.10701108e-002, 0.,
3.69003713e-002, 0., 0., 3.69003695e-003, 0., 0., 0., 0., 0., 0.,
1.10701108e-002, 3.69003695e-003, 0., 7.38007389e-003, 0., 0.,
7.38007389e-003, 0., 1.10701108e-002, 0., 7.74907768e-002, 0., 0.,
0., 0., 0., 0., 0., 0., 3.69003695e-003, 0., 0., 0., 0.,
3.69003695e-003, 3.69003695e-003, 3.69003695e-003,
5.16605154e-002, 0., 3.69003695e-003, 3.69003695e-003, 0., 0., 0.,
0., 0., 3.69003695e-003, 0., 1.47601478e-002, 0., 3.69003695e-003,
7.38007389e-003, 0., 0., 0., 3.69003695e-003, 0., 0., 0.,
7.38007389e-003, 7.38007389e-003, 3.69003695e-003, 0., 0., 0., 0.,
7.38007389e-003, 7.38007389e-003, 0., 0., 0., 3.69003695e-003,
7.38007389e-003, 0., 1.84501857e-002, 3.69003695e-003,
2.21402217e-002, 0., 0., 0., 1.10701108e-002, 0., 3.69003695e-003,
0., 0., 0., 3.69003695e-003, 0., 0., 7.38007389e-003, 0., 0., 0.,
0., 0., 0., 3.69003695e-003, 0., 0., 3.69003695e-003,
3.69003695e-003, 0., 0., 3.69003695e-003, 3.69003695e-003, 0., 0.,
0., 0., 7.38007389e-003, 0., 0., 7.38007389e-003, 2.58302577e-002,
0., 0., 0., 7.38007389e-003, 0., 0., 3.69003695e-003,
3.69003695e-003, 0., 0., 3.69003695e-003, 7.38007389e-003,
3.69003695e-003, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
7.38007389e-003, 0., 0., 7.38007389e-003, 0., 0., 3.69003695e-003,
0., 0., 3.69003713e-002, 4.05904055e-002, 0., 0., 3.69003695e-003,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1.10701108e-002, 0., 0., 1.10701108e-002, 0., 3.69003695e-003, 0.,
0., 7.38007389e-003, 0., 3.69003695e-003, 0., 0., 0., 0., 0., 0.,
3.69003695e-003, 0., 0., 0., 3.69003695e-003, 8.85608867e-002, 0.,
0., 0., 0., 0., 3.69003695e-003, 0. ]
In the file here some values are zero and others are fraction value. Is it correct output?
|
|
|
|
|
Hi,
First, thank you very much for the helpful tutorial. I have been following your article and discussions, which are very informative. I have some questions regarding dictionaries and BOF-descriptors that are created by using your code.
Let’s say for CBIR application, I have a dataset of 1 million images and 100 query images, which are picked randomly from within 1 million images dataset.
I tried to generate BOF-SIFT dictionary by following your code. But I am not able to generate a dictionary for more than around 1000 images, because of OUT OF MEMORY error from Opencv.
So, by following your discussions, I want to generate dictionary by using Divide and Conquer method. But, my questions are:
- If I have to generate dictionary from 1 million images, I can divide dataset into 1,000 x 1,000 images and generate 1,000 dictionary files. Later 1,000 BOF-SIFT descriptor files can be generated right?
As I am separately clustering the dataset into 1,000 different partitions, doesn’t it make any performance difference in creating BOF-SIFT descriptors?? (Because, K-means clustering is done in 1,000 different partitions rather than clustering the whole dataset) - If I have to generate BOF-SIFT descriptor for 100 query images, which dictionary I have to use among 1,000 dictionaries?
If the queries are not included in the dataset, then separate dictionary and BOF-SIFT descriptors for queries can be created right?
Thank you in advance and I will be eagerly looking forward to your reply .
|
|
|
|
|
Thank you for sharing the code. It has been extremely helpful to me.
I modified your code to implement objects classification. Currently I have two sets of images as my database, of which 52 images are colored squares (from different viewing aspects) and the other 78 images are beverage cans. I created the dictionary using the 130 images. After extracting the BOF descriptors, I used SVM to do the classification. When testing the classifier with my local database, the outcome was acceptable. However, there are few (6 in total) failures.
I wanna ask you the following questions:
1. In your code:
//I select 20 (1000/50) images from 1000 images to extract
//feature descriptors and build the vocabulary
Why select only 20 images out of 1000 instead using them all? Is it for efficiency? But how can you be sure that amount would be enough?
2.When you apply K-MEANS:
int dictionarySize=200;
Why set the size to 200? How do you decide the size? I find this size affecting the classification result but don't understand how important this parameter could be.
Later I'm planning to add more classes such as books into the model, so I will have to improve the accuracy of my classifier. Thanks again in advance.
|
|
|
|
|
For the first question, it is better to use all 1000 images than using 20. I used 20 just because I need to create the vocabulary in a short time. Extracting SIFT features from 1000 images takes a lot of time. Anyway it is always better to use as many as images you can find with in the domain.
For the second question, I selected 200 as the dictionary size because I have referred a research paper which included an extensive experiment to find the best size of the dictionary. Check A SIFT-LBP IMAGE RETRIEVAL MODEL BASED ON BAG-OF-FEATURES.
But it is your task to find the best number of bags that suits to the domain of application. More the number of bags means more discriminative power and high computational complexity. But even for the improvements in discriminative power has a limit. Therefore you may have to search for the best number of bags exhaustively.
|
|
|
|
|
What dataset of images have you used to train your code ?
|
|
|
|
|
In my work I have used Corel dataset. But I have not included the trained dictionary file in the zip. You can use whatever the dataset you have.
|
|
|
|
|
So after computing descriptor how we can use this for object detection? or how can we measure the accuracy of the detection?
e.g. How I can use this code to say if a image is bike or not (when I made my dictionary on bikes)?
modified 21-Apr-15 19:48pm.
|
|
|
|
|
You will have to use a classifier such as support vector machine (SVM) or neural network (NN). Look them out.
|
|
|
|
|
I know that, any good example available using opencv?
|
|
|
|
|
|
Hi Mr Bandara,
Thanks for this article/tutorial.This is a really nice article that helped me a lot in understanding BoW. I have tried your code and it worked nicely for object recognition like bikes, airplanes, faces and cars. However, i believe that the SIFT extracted here are not the dense sampling strategy (correct me if i'm wrong). It is somewhat the strategy from the original paper. What I would like to ask is, How do I extract dense SIFT features with this code ?
Thanks in advance
|
|
|
|
|
Hi,
Yes this is not a dense descriptor as it first detect salient key points. Dense SIFT feature means that we are calculating the SIFT descriptor for each and every pixel or predefined grid of patches in the image. So you have to give dense feature points (instead of what you get from the .detect() function) when you call the siftExtractorObject.compute() function. If you are using OpenCV 2.4.6 or later then you can use FeatureDetector::create(const string& detectorType) with the string "Dense" . See the documentation here: DenseFeatureDetector
|
|
|
|
|
Thank you for your reply. I have managed to extract dense SIFT like this for the first part:
{ dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); dense_SIFT_BoW(img_raw, featuresUnclustered); count++;
}
The function dense_SIFT_BoW() is like this:
void dense_SIFT_BoW(Mat img_raw,Mat &featuresUnclustered)
{
Mat descriptors; vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f, 1, 0.1f, 10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
descriptorExtractor->compute(img_raw,keypoints,descriptors);
descriptors.setTo(0, descriptors < 0);
descriptors = descriptors.reshape(0,1);
featuresUnclustered.push_back(descriptors);
}
In the first part, everything works well. I get a very large .xml file (76800 x 256). 76800 corresponds to the dense SIFT descriptors for each image, and 256 the number of images(no of image in a folder).
The problem is the second part when we we want to do nearest neighbor matching etc. I have done something like this:
Mat dictionary;
FileStorage fs("D:/WillowActions Backup/Willow 300_200/Dic_Dense.xml", FileStorage::READ);
fs["Vocabulary"] >> dictionary;
fs.release();
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
.
.
.
Mat img_raw; Mat bowTry;
int count = 0;
.
.
.
{
dirName = pent->d_name;
imgPath = imgDir+dirName; img_raw = imread(imgPath,1); SIFT_matcher(img_raw, dictionary, bowTry, matcher);
count++;
}
The function SIFT_matcher() goes like this:
void SIFT_matcher(Mat img_raw, Mat &dictionary, Mat &bowTry, Ptr<DescriptorMatcher> &matcher)
{
Mat descriptors;
vector<KeyPoint> keypoints;
DenseFeatureDetector detector(12.f,1,0.1f,10);
detector.detect(img_raw,keypoints);
Ptr<DescriptorExtractor> descriptorExtractor = DescriptorExtractor::create("SIFT");
BOWImgDescriptorExtractor bowDE(descriptorExtractor, matcher);
bowDE.setVocabulary(dictionary);
Mat bowDescriptor;
bowDE.compute(img_raw,keypoints,bowDescriptor);
bowTry.push_back(bowDescriptor);
bowDescriptor.release();
}
When running the code for this second part, the program simply crashes. Is there something that you find odd in my code that probably i've overlooked. If you need the whole code feel free to let me know.
|
|
|
|
|