In this article we’ll implement a simple object detector from scratch trained on our custom dataset and make our way to detecting and tracking objects (or humans in our case) in real time.
Introduction
Having people lined-up to enter your coffee shop or store is a wonderful thing to witness since more customers means more business. But according to the latest research, long lines deter more people from getting in line. For instance, when your stomach is growling from hunger, the last thing you would want to do is to wait in a line to eat. That’s why it’s important for businesses to have an effective queueing strategy in place. A decade ago, getting a computer to count the number of people lined up in a queue was a very difficult problem. Companies usually had to hire entire academic research teams to try to do it accurately. Then Deep Learning showed up. What was once an incredibly difficult problem can now be solved by anyone with a decent GPU or access to affordable cloud GPU instances.
In this article series, we’re going to show how to make an AI queue length detector. We’ll start off by implementing a simple object detector from scratch trained on our custom dataset and make our way to detecting and tracking objects (or humans in our case) in real time. Later on, we’ll also see how we can make our solution efficient by using a pre-trained object detection network like YOLO.
AI Queue Length Detection: Object detection using Keras
Object detection is thought to be a complex computer vision problem since we need to find the location of the desired object/objects in the given image or video and also determine what type of objects were detected. Recent advancements in deep learning-based models have made it easier to develop object detection applications. Not only is there a significant performance improvement, but these models also leverage state-of-the-art techniques and huge datasets to achieve desired results efficiently.
In this article, we’ll keep things simple and try to implement a convolutional neural network (CNN) for object detection. Before moving to the coding part, let’s first quickly recap what a CNN is.
Convolutional Neural Networks
Simply stating, a CNN is a class of deep neural networks which are commonly used to analyze visual imagery. Like other neural networks, it consists of different layers (that is, the input layer, one or more hidden layers, and an output layer). It can take an image as input, assign weights to different aspects or objects in the image, and differentiate them from each other. Each of the hidden layers transforms its input using specific patterns and features and passes it on to the next layer, while the last layer usually classifies the object.
Training a CNN to Detect Objects
Before we continue, it is important to note that I will assume you are somewhat familiar with Python and have some understanding of deep learning as well, so I’ll be skipping the installation process for common Python libraries. I will be using Anaconda distribution for windows to work on Jupyter Notebook, but feel free to choose any IDE to work along. The code is not platform-specific and should work fine.
For this article, we’ll be working with the cifar10 dataset to train a simple object detector.
Let's start by importing the libraries we need.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from keras import losses,metrics
from keras.optimizers import Adam
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras.utils.np_utils import to_categorical
from keras.layers import MaxPool2D, Dense, Conv2D, Flatten
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
The next step is to load data. You can use the following lines to download it right in the notebook.
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Let’s verify the shape of each set.
X_train.shape
X_test.shape
y_train.shape
Y_test.shape
The cifar10 dataset has ten different classes of objects all equal in count so it will help the algorithm learn all the classes equally. This can be verified as follows:
np.unique(y_train,return_counts=True)
We need to standardize this data before we can feed it to our neural network, and also change class variables to matrices.
x_train=x_train.astype('float32')
x_test=x_test.astype('float32')
x_train /= 200
x_test /= 200
y_train=to_categorical(y_train)
y_test=to_categorical(y_test)
Now is the time to define our model to classify these objects. We will use a sequential model with two convolutional layers. The layers will have 32 filters (later on we’ll apply max pooling to reduce the spatial dimensions). We’ll compile our model using Adam activation.
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3),activation='relu'))
model.add(Conv2D(32, (3, 3),activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3),activation='relu'))
model.add(Conv2D(64, (3, 3),activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss=losses.categorical_crossentropy,
optimizer=Adam(),
metrics=[metrics.categorical_accuracy])
You can see the model summary as follows:
model.summary()
Let’s now fit the model on our training data.
early=EarlyStopping(monitor='loss')
hist=model.fit(x_train,y_train,batch_size=100,validation_split=0.2,epochs=5,callbacks=[early])
Testing Our Model
Now that our model is trained, we can evaluate how good our model performed using the evaluate
function.
model.evaluate(x_test,y_test,batch_size=100)
You should seen an output similiar to the following:
100/100 [==============================] - 10s 97ms/step - loss: 0.8231 - categorical_accuracy: 0.7128
[0.8231054544448853, 0.7128000259399414]
We have obtained 71% accuracy, which is good enough since we can see it’s not over fitting. We can also plot this accuracy for both training and validation.
Let’s now predict the classes of our test dataset using our freshly trained model.
y_pred = model.predict_classes(x_test)
We can again test the accuracy of our model using accuracy_score.
print(accuracy_score(y_test_classes,y_pred))
We have obtained a score of 72% which is not too bad since we implemented a very simple CNN architecture. We can still improve upon these scores provided that we have the desired computational power and memory. For now, our model generalizes well on the whole data and gives an overview of how we can use CNNs for object detection.
What’s Next?
In this article, we scratched the surface of object detection. The model we trained can be used to detect general objects with 72% accuracy. What if we want to improve performance even further? A traditional CNN is not always a good choice for object detection. Our model performed well because each class has the same count in the dataset; this is usually not the case when dealing in the real world. We’d face many problems if we tried to train the model for custom object detection. In the next article, we’ll explore some other algorithms used for object detection and will learn to implement them for custom object detection.