Here we give a short description of AI face detection libraries, state that we’ll use MTCNN, develop the code for face detection with MTCNN using a pretrained DNN model, and test the detection algorithm.
Introduction
Face recognition is one area of Artificial Intelligence (AI) where deep learning (DL) has had great success over the past decade. The best face recognition systems can recognize people in images and video with the same precision humans can – or even better. The two main base stages of face recognition are person verification and identification.
In the first (current) half of this article series, we will:
- Discuss the existing AI face detection methods and develop a program to run a pretrained DNN model
- Consider face alignment and implement some alignment algorithms using face landmarks
- Run the face detection DNN on a Raspberry Pi device, explore its performance, and consider possible ways to run it faster, as well as to detect faces in real time
- Create a simple face database and fill it with faces extracted from images or videos
We assume that you are familiar with DNN, Python, Keras, and TensorFlow. You are welcome to download this project code ...
In the previous article, we discussed the principles of face detection and facial recognition. In this one, we’ll have a look at specific face detection methods and implement one of them.
Face Detection Methods
Face detection is the first phase of any face recognition process. It is a critical step that influences all subsequent steps. It requires a robust approach to minimize the detection error. There are many methods of face detection; we’ll concentrate on AI-based approaches.
We’d like to mention the following modern methods of face detection: Max-Margin Object Detection (MMOD), Single-Shot Detector (SSD), Multi-task Cascaded Convolutional Networks (MTCNN), and You Look Only Once (YOLO).
MMOD models require too many resources to run on an edge device. The fastest DNN is YOLO; it provides a rather good precision while detecting faces in real-scene video. The most precise of the above methods is SSD. It has enough processing speed to be used on low-powered devices.
The main drawback of the YOLO and SSD methods is that they cannot provide information on facial landmarks. As we’ll see further, this information is important for face alignment.
MTCNN provides good precision and finds facial landmarks. It is lightweight enough to run on resource-constrained edge devices.
MTCNN Detector
In this series, we’ll use a free Keras implementation of the MTCNN detector. You can install this library in a Python environment using the standard pip
command. It requires OpenCV 4.1 and TensorFlow 2.0 (or later versions).
You can test if the MTCNN is installed successfully by running simple Python code:
import mtcnn
print(mtcnn.__version__)
The output must show the version of the installed library – 0.1.0.
After the library has been installed, we can write MTCNN-based code for a simple face detector:
import os
import time
import numpy as np
import copy
import mtcnn
from mtcnn import MTCNN
import cv2
class MTCNN_Detector:
def __init__(self, min_size, min_confidence):
self.min_size = min_size
self.f_detector = MTCNN(min_face_size=min_size)
self.min_confidence = min_confidence
def detect(self, frame):
faces = self.f_detector.detect_faces(frame)
detected = []
for (i, face) in enumerate(faces):
f_conf = face['confidence']
if f_conf>=self.min_confidence:
detected.append(face)
return detected
def extract(self, frame, face):
(x1, y1, w, h) = face['box']
(l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
f_cropped = copy.deepcopy(face)
move = (-x1, -y1)
l_eye = Utils.move_point(l_eye, move)
r_eye = Utils.move_point(r_eye, move)
nose = Utils.move_point(nose, move)
mouth_l = Utils.move_point(mouth_l, move)
mouth_r = Utils.move_point(mouth_r, move)
f_cropped['box'] = (0, 0, w, h)
f_img = frame[y1:y1+h, x1:x1+w].copy()
f_cropped = Utils.set_keypoints(f_cropped, (l_eye, r_eye, nose, mouth_l, mouth_r))
return (f_cropped, f_img)
The detector class has the constructor with two parameters: min_size
– the minimal size of a face in pixels; and min_confidence
– the minimal confidence to confirm that the detected object is a face. The detect
method of the class uses the internal MTCNN detector to get the faces in a frame, then filters the detected objects that have at least the minimal confidence value. The last method, extract
, is intended to crop face images from the frame.
We’ll also need the following Utils
class:
class Utils:
@staticmethod
def draw_face(face, color, frame, draw_points=True, draw_rect=True, n_data=None):
(x1, y1, w, h) = face['box']
confidence = face['confidence']
x2 = x1+w
y2 = y1+h
if draw_rect:
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
y3 = y1-12
if not (n_data is None):
(name, conf) = n_data
text = name+ (" %.3f" % conf)
else:
text = "%.3f" % confidence
cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
if draw_points:
(l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
Utils.draw_point(l_eye, color, frame)
Utils.draw_point(r_eye, color, frame)
Utils.draw_point(nose, color, frame)
Utils.draw_point(mouth_l, color, frame)
Utils.draw_point(mouth_r, color, frame)
@staticmethod
def get_keypoints(face):
keypoints = face['keypoints']
l_eye = keypoints['left_eye']
r_eye = keypoints['right_eye']
nose = keypoints['nose']
mouth_l = keypoints['mouth_left']
mouth_r = keypoints['mouth_right']
return (l_eye, r_eye, nose, mouth_l, mouth_r)
def set_keypoints(face, points):
(l_eye, r_eye, nose, mouth_l, mouth_r) = points
keypoints = face['keypoints']
keypoints['left_eye'] = l_eye
keypoints['right_eye'] = r_eye
keypoints['nose'] = nose
keypoints['mouth_left'] = mouth_l
keypoints['mouth_right'] = mouth_r
return face
@staticmethod
def move_point(point, move):
(x, y) = point
(dx, dy) = move
res = (x+dx, y+dy)
return res
@staticmethod
def draw_point(point, color, frame):
(x, y) = point
x1 = x-1
y1 = y-1
x2 = x+1
y2 = y+1
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
@staticmethod
def draw_faces(faces, color, frame, draw_points=True, draw_rect=True, names=None):
for (i, face) in enumerate(faces):
n_data = None
if not (names is None):
n_data = names[i]
Utils.draw_face(face, color, frame, draw_points, draw_rect, n_data)
In the output of the MTCNN detector, each face object is a dictionary with the following keys: box
, confidence
, and keypoints
. The keypoints
item is a dictionary that contains data for face landmarks: left_eye
, right_eye
, nose
, mouth_left
, and mouth_right
. The Utils
class provides simple access to the face data and implements several functions to manipulate the data and draw bounding boxes around faces in images.
Face Detection in Images
Now we can write Python code that will detect faces in images:
d = MTCNN_Detector(30, 0.5)
print("Detector loaded.")
f_file = r"C:\PI_FR\frames\frame_5_02.png"
fimg = cv2.imread(f_file)
faces = d.detect(fimg)
for face in faces:
print(face)
Utils.draw_faces(faces, (0, 0, 255), fimg, True, True)
res_path = r"C:\PI_FR\detect"
f_base = os.path.basename(f_file)
r_file = os.path.join(res_path, f_base+"_detected.png")
cv2.imwrite(r_file, fimg)
for (i, face) in enumerate(faces):
(f_cropped, f_img) = d.extract(fimg, face)
Utils.draw_faces([f_cropped], (255, 0, 0), f_img, True, False)
dfname = os.path.join(res_path, f_base + ("_%06d" % i) + ".png")
cv2.imwrite(dfname, f_img)
A run of the above code produces this image in the detect
folder.
As you can see, the detector has found all three faces with good confidence – about 99%. We also get cropped faces in the same directory.
Running the same code for the different frames, we can test detections for the various cases. Here are results for two frames.
The results demonstrate that the detector is able to find faces with glasses and also successfully detects the face of a baby.
Face Detection in Video
Having tested the detector on separate images, let’s now write code for detecting faces in video:
class VideoFD:
def __init__(self, detector):
self.detector = detector
def detect(self, video, save_path = None, align = False, draw_points = False):
detection_num = 0;
capture = cv2.VideoCapture(video)
img = None
dname = 'AI face detection'
cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
cv2.resizeWindow(dname, 960, 720)
frame_count = 0
dt = 0
face_num = 0
while(True):
(ret, frame) = capture.read()
if frame is None:
break
frame_count = frame_count+1
t1 = time.time()
faces = self.detector.detect(frame)
t2 = time.time()
p_count = len(faces)
detection_num += p_count
dt = dt + (t2-t1)
if (not (save_path is None)) and (len(faces)>0) :
f_base = os.path.basename(video)
for (i, face) in enumerate(faces):
(f_cropped, f_img) = self.detector.extract(frame, face)
if (not (f_img is None)) and (not f_img.size==0):
if draw_points:
Utils.draw_faces([f_cropped], (255, 0, 0), f_img, draw_points, False)
face_num = face_num+1
dfname = os.path.join(save_path, f_base + ("_%06d" % face_num) + ".png")
cv2.imwrite(dfname, f_img)
if len(faces)>0:
Utils.draw_faces(faces, (0, 0, 255), frame)
cv2.imshow(dname,frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
capture.release()
cv2.destroyAllWindows()
fps = frame_count/dt
return (detection_num, fps)
The VideoFD
class simply wraps our implementation of the MTCNN detector and feeds to it the frames extracted from a video file. It uses the VideoCapture
class from the OpenCV library.
We can launch the video detector with the following code:
d = MTCNN_Detector(50, 0.95)
vd = VideoFD(d)
v_file = r"C:\PI_FR\video\5_3.mp4"
save_path = r"C:\PI_FR\detect"
(f_count, fps) = vd.detect(v_file, save_path, False, False)
print("Face detections: "+str(f_count))
print("FPS: "+str(fps))
Here is the resulting video captured from the screen:
The test shows fine results: faces had been detected in most frames from the video file. The processing speed is about 20 FPS on a Core i7 CPU. That’s impressive for a difficult task such as face detection.
Next Steps
Looks like we can use an implementation of the MTCNN detector for real-time video detection. Our final goal is running the detector on a low-power edge device. Before starting experiments with edge devices, we must implement another part of the face recognition pipeline – face alignment. In the next article, we’ll explain how to perform the alignment based on the face landmarks the detector has found. Stay tuned!