Here we show you how to feed the dataset images to the model and optimize it for the age estimation.
In this series of articles, we’ll show you how to use a Deep Neural Network (DNN) to estimate a person’s age from an image.
Having designed and built the CNN model for age estimation, in this article – the fifth of the series – we are going to train that model to classify people in the images into the appropriate age groups.
We need to implement the following functionality:
- Preprocess images to satisfy the network’s input criteria
- Load images from the files into memory
- Convert the data to the format acceptable for the model optimization
- Launch the training process
Prepare Images to Serve as Input
The way we designed our CNN it expects input data to consist of gray (one-channel, 8-bit) images sized 128 x 128 pixels. Now we need to provide some conversion functionality to preprocess the original (color) images into the valid input format. Here is the Python code that defines two classes for implementing the conversion functionality:
import cv2
class ResizeConverter:
def __init__(self, width, height):
self.width = width
self.height = height
def convert(self, image):
return cv2.resize(image, (self.width, self.height), cv2.INTER_AREA)
class GrayConverter:
def convert(self, image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
The ResizeConverter
class is intended to resize images to the specified width and height. Note that we first imported the OpenCV package cv2
. This package includes all the functions we need to work with image data. The resize
method of this class uses cv2.resize
with the specified parameter values. The cv2.INTER_AREA
interpolation type is the recommended algorithm for image shrinking.
The GrayConverter
class has only one method; it converts a color image to the 8-bit one-channel gray format, which is specified by the cv2.COLOR_BGR2GRAY
value.
Define Image Loading Process
Having implemented the converter classes, we can now implement the dataset class for loading images into memory:
import os
import numpy as np
import cv2
class ImageDataset:
def __init__(self, converters):
self.converters = converters
def get_files(self, folder):
filenames = os.listdir(folder)
for filename in filenames:
filepath = os.path.join(folder, filename)
yield filepath
def load(self, folder):
self.images = []
self.labels = []
files = list(self.get_files(folder))
for (i, path) in enumerate(files):
image = cv2.imread(path)
fname = os.path.basename(path)
label = fname.split('_')[0]
if self.converters is not None:
for c in self.converters:
image = c.convert(image)
self.images.append(image)
self.labels.append(int(label))
def get_data(self):
return (np.array(self.images), np.array(self.labels))
The class constructor receives one parameter – a set of converters to be used for image preprocessing. The main method, load
, requires one parameter – the full path to the folder with image files. This method finds all the files inside the directory, reads the image from each file using the cv2.imread
function, and then applies all the converters to the image. It also parses the age labels from the file names and stores them as integer values. See the second article of the series for description of the file name syntax.
Convert Data to Optimizable Format
The last step before CNN model training is to convert the loaded images to a special format. This is achieved with the convert
method of the AgeClassConverter
class:
import numpy as np
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
class AgeClassConverter:
@staticmethod
def convert(imdataset, ageranges):
(images, labels) = imdataset.get_data()
arrays = []
for (i, image) in enumerate(images):
arr = img_to_array(image, data_format="channels_last")
arrays.append(arr)
arrays = np.array(arrays).astype("float")/255.0
k = len(ageranges)
for (i, label) in enumerate(labels):
for (j, r) in enumerate(ageranges):
if j<(k-1) and label>=ageranges[j] and label<ageranges[j+1]:
labels[i] = j+1
break
lb = LabelBinarizer()
lb.fit(range(1, k));
binlabels = np.array(lb.transform(labels))
return (arrays, binlabels)
The first parameter of the convert
method is an instance of the ImageDataset
class. The second parameter is the list of age values to form the ranges for the age groups. The first loop in the method loops over the images in the dataset and converts every image to a special Keras array format using the img_to_array
function. Note that we specify the data format as channels_last
. It is assumed that the channels of the images adhere to the spatial dimensions – width and height. After the loop, we normalize the data to the [0, 1.0] range, dividing the values by 255.0.
The second loop of the method converts the integer age values found in the labels to the age groups. For example, suppose we call the method with the following age range values [1, 6, 11, 16, 19, 22, 31, 45, 61, 81, 101]. There are eleven values, which provide ten age intervals: 1-5, 6-10, 11-15, …, 81-100. If the dataset contains five labels with the age values [2, 6, 8, 15, 21], the loop will transform these values to the group indicators [1, 2, 2, 3, 5].
After the loop, we use the LabelBinarizer
class, imported from the sklearn.preprocessing
package, to convert label values to a special binary format used for classification problems. Instead of a single value for a label (age group), this format provides probability values for all possible classes. For example, conversion of the five label values from our example above would result in the following binarized data:
[ [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0] ]
As you can see, the binarized data contains an array of probabilities for all age groups. As we have ten age groups, every probability array has ten values: zero value if the age is not in the range, and unit value if the age label belongs to this group.
Load Images
Now we have the code for all classes to load the dataset into memory:
ageranges = [1, 6, 11, 16, 19, 22, 31, 45, 61, 81, 101]
classes = len(ageranges)-1
imgsize = 128
rc = ResizeConverter(imgsize, imgsize)
gc = GrayConverter()
trainSet = ImageDataset([rc, gc])
trainSet.load(r"C:\Faces\Training")
(trainData, trainLabels) = AgeClassConverter.convert(trainSet, ageranges)
testSet = ImageDataset([rc, gc])
testSet.load(r"C:\Faces\Testing")
(testData, testLabels) = AgeClassConverter.convert(testSet, ageranges)
In the above code, we assign values to the age range list and image size. Then we instantiate the converters for image resizing and color conversion. The converter list is used as the parameter for constructing the training and testing datasets. The datasets are loaded from the disk by the load
method with the path to the image files’ directory. Finally, we call the static AgeClassConverter.convert
method to convert our dataset to the format acceptable for the Keras optimization algorithms.
Train the Model
We are now ready to launch the training process:
frep = net.fit(trainData, trainLabels, validation_data=(testData, testLabels), batch_size=128, epochs=20, verbose=1)
netname = r"C:\Faces\age_class_net_"+str(kernels)+"_"+str(hidden)+".cnn"
net.save(netname)
The net notation stands for our CNN model, which we’ve instantiated a couple of articles back. We call its fit
method to launch the optimization process. The method parameters are:
trainData
and trainLabels
are the training data and binarized labels, respectively validation_data
is the tuple of the testing data and labels batch_size
is the size of batches for the selected SGD optimization method epochs
is the number of epochs (iterations over the full dataset) for the training process verbose
is the level of the information shown during the process
Executing the code will launch the model training process. Note that the process can take several hours to finish with an average CPU. During execution, information about the process iterations is shown in the output. It looks like this:
Train on 21318 samples, validate on 2369 samples
Epoch 1/20
21318/21318 [==============================] - 1146s 54ms/step - loss: 1.7091 - accuracy: 0.4166 - val_loss: 2.2536 - val_accuracy: 0.0912
Epoch 2/20
21318/21318 [==============================] - 1124s 53ms/step - loss: 1.3156 - accuracy: 0.5058 - val_loss: 1.6474 - val_accuracy: 0.4116
Epoch 3/20
21318/21318 [==============================] - 1118s 52ms/step - loss: 1.2010 - accuracy: 0.5439 - val_loss: 1.2562 - val_accuracy: 0.5230
There are two values you need to pay attention to: accuracy
and val_accuracy
. The former is the precision of the classification on the training dataset, and the latter is the precision of the age group prediction on the testing dataset. As you can see in the sample output above, the last val_accuracy
value is 0.5222. This means that at this step our CNN correctly predicts the age group for 52% of the images in the testing dataset. We should keep track of these values to be sure that the optimization process converges. The ideal case is when both values monotonically increase to the value of 1.0.
After the specified number of epochs, the process will stop, and the CNN model will be saved to the disk. The final testing accuracy we reached in our example is about 56%. The prediction accuracy can be increased with the various methods, such as using bigger datasets and deeper network architecture, regularization, data augmentation, and so on.
Next Step
We now have the pre-trained CNN saved to the disk. The next step is to use it for age estimation of a person from an image.