Introduction
In this article we will look at training and testing of a Multi-class Logistic Classifier
- Logistic regression is a probabilistic, linear classifier. It is parametrized by a weight matrix W and a bias vector b. Classification is done by projecting data points onto a set of hyperplanes, the distance to which is used to determine a class membership probability.
- Mathematically this can be expressed as
P(Y=i|x,W,b)=eWix+bi∑jeWjx+bj
- Corresponding to each class yi logistic classifier is paramemterized by a set of parameters Wi,bi.
- These parameters are used to compute the class probability.
- Given a unknown vector x,The prediction is performed as
ypred=argmaxiP(Y=i|x,W,b)ypred=argmaxieWix+bi∑jeWjx+bj
- Given a set of labelled training data Xi,Yi where i in 1,…,N we need to estimate these parameters.
Loss Function
- Ideally we would like to compute the parameters so that the 0−1 loss is minimized
ℓ0,1=|D|∑i=0If(x(i))≠y(i)f(x)=argmaxkP(Y=yk|x,θ)
- P(Y=yk|x,θ) is modelled using logistic function.
- The 0−1 loss function is not differentiable, hence optimizing it for large modes is computationally infesible.
- Instead we maximize the log-likelyhood of the classifier given the training data D.
- Maximum Likelyhood estimation is used to perform this operation.
- Estimate the parameters so that likelyhood of training data D is maximized under the model parameters
- It is assumed that the data samples are independent, so the probability of the set is the product of probabilities of individual examples.
L(θ=W,b,D)=argmaxN∏i=1P(Y=yi|X=xi,W,b)L(θ,D)=argmaxN∑i=1logP(Y=yi|X=xi,W,b)L(θ,D)=−argminN∑i=1logP(Y=yi|X=xi,W,b)
- It should be noted that Likelyhood of correct class is not same as number of right predictions.
- Log Likelyhood function can be considered as differential version of the 0−1 loss function.
- In the present application negative log-likelyhood is used as the loss function
- Optimal parameters are learned by minimizing the loss function.
- In the present application gradient based methods are used for minimization.
- Specifically stochastic gradient descent and conjugated gradient descent are used for minimization of the loss function.
- The cost function is expressed as
L(θ,D)=−N∑i=1logP(Y=yi|X=xi,W,b)L(θ,D)=−N∑i=1logeWix+bi∑jeWjx+bjL(θ,D)=−N∑i=1logeWix+bi−log∑jeWjx+bjL(θ,D)=−N∑i=1Wix+bi+log1∑jeWjx+bj
- The first part of the sum is affine,second is a log of sum of exponentials which is convex Thus the loss function is convex.
- Thus we can compute the parameters corresponding to global maxima of the loss function using gradient descent methods.
- Thus we compute the derivatives of the loss function L(θ,D) with respect to θ,∂ℓ/∂W and ∂ℓ/∂b
Theano
- Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
- It is a expression compiler,and can evaluate symbolic expression when executed.Typically programs which are implemented in C/C++ can be written concisely and efficiently in Theano.
- computing the gradients in most programming languages (C/C++, Matlab, Python), involves manually deriving the expressions for the gradient of the loss with respect to the parameters ∂ℓ∂W, and ∂ℓ/∂b,
- This approah not only involves manual coding but the derivatives can get difficult to compute for complex models,
- With Theano, this work is greatly simplified as it performs automatic differentiation .
Example
Theano Code
- The python code for training and testing can be found in the git repository https://github.com/pi19404/OpenVision ImgML/LogisticRegression.py file.
- the ImgML/load_datasets.py contains methods to load datasets from pickel files or SVM format files.
""" symbolic expressions defining input and output vectors"""
x=T.matrix('x');
y=T.ivector('y');
""" The mnist dataset in pickel format"""
model_name1="/media/LENOVO_/repo/mnist.pkl.gz"
""" creating object of class Logistic regression"""
""" input is 28*28 dimension feature vector ,and
output lables are digits from 0-9 """
classifier = LogisticRegression(x,y,28*28,10);
""" loading the datasets"""
[train,test,validate]=load_datasets.load_pickle_data(model_name1);
""" setting the dataset"""
classifier.set_datasets(train,test,validate);
#
#classifier.init_classifier(model_name1);n_out
""" Training the classifiers"""
classifier.train_classifier(0.13,1000,30);
""" Saving the model """
classifier.save('1');
#x=classifier.train[0].get_value(borrow=True)[0];
#classifier.predict(x);
""" Loading the model"""
classifier.load('1')
x=train[0].get_value(borrow=True);
y=train[1].eval();
print 'True class:'+`y`
xx,yy=classifier.predict(x);
print 'Predicted class:' + `yy`
classifier.testing();
- C/C++ code has been also written using Eigen/OpenCV and incorporated in OpenVision Library. This can be found in the files https://github.com/pi19404/OpenVision ImgML/LogisticRegression.cpp and ImgML/LogisticRegression.hpp files.