Here we state the problem of face identification. Then do a short review of existing AI libraries for face identification. We use the Keras FaceNet library. Then we develop the Python code for face identification and test the algorithm with the created face database.
Introduction
Face recognition is one area of artificial intelligence (AI) where the modern approaches of deep learning (DL) have had great success during the last decade. The best face recognition systems can recognize people in images and video with the same precision humans can – or even better.
Our series of articles on this topic is divided into two parts:
- Face detection, where the client-side application detects human faces in images or in a video feed, aligns the detected face pictures, and submits them to the server.
- Face recognition (this part), where the server-side application performs face recognition.
In this part of the series, we’ll discuss the problem of face identification and combine the previously developed face detector with a face recognizer. We’ll then implement face identification in a Docker container and add a web API for transferring the detected faces to the server running recognition. In addition, we’ll consider certain aspects of running face recognition in Kubernetes. Finally, we’ll talk about building face recognition systems from scratch.
We assume that you are familiar with DNN, Python, Keras, and TensorFlow. You are welcome to download this project code to follow along.
Face Identification
Face identification can be described as finding in a face database the person most similar to the one being identified. In the previous article (the last one in the first half of this series), we created a database of 15 people. In this half of the series, we’ll consider the face identification task in more detail and develop an algorithm for identifying people in a video feed using a pre-trained DNN model.
We have a database of faces that contains sample images of people – one photograph for every person. When faces are detected in images or video, the detector produces a picture of each detected face. We must determine if the face in that picture belongs to one of the people in our database. There are two possible cases:
- The detected face belongs to one of the people in the database, in which case we must specify this person’s ID (for example, their name).
- The detected face belongs to an unknown person, in which case we must state this fact.
Face Recognition Models
All modern state-of-the-art methods of face recognition use DNN models for face identification. The models can have different architectures and can be trained on different face databases. However, they use similar approaches to achieve the goal. First, a DNN model is used as a feature extractor to get embeddings for a face image. These embeddings are then used to determine how similar the faces are, which is known as the "distance" between two face images – the less the distance, the more similar the faces. By evaluating the distances between the detected face and all faces in the database, we can find the most similar person.
Developing and training DNN models for face recognition is not a trivial task. Fortunately, there are many free pre-trained models and libraries that implement state-of-the-art DNN architectures for face recognition. For example:
Each of the above models has its pros and cons. The best choice depends on the situation. In this series, we’ll use the FaceNet pre-trained model for two reasons:
- The model is trained to optimize face embeddings directly, without an intermediate layer.
- We can run a recognition algorithm with just a few lines of Python code because this model is implemented in Keras.
Face Recognizer Code
It’s time to write code for our face recognizer. A DNN-based face recognizer must implement at least two functions: one that extracts embeddings from a face image, and another one that evaluates the distance between the embeddings in two face images. Here is how we coded the recognizer based on the FaceNet DNN model:
class FaceNetRec:
def __init__(self, model, min_distance):
self.model = load_model(model)
self.min_distance = min_distance
def get_model(self):
return self.model
def embeddings(self, f_img):
r_img = cv2.resize(f_img, (160, 160), cv2.INTER_AREA)
arr = r_img.astype('float32')
arr = (arr-127.5)/127.5
samples = np.expand_dims(arr, axis=0)
embds = self.model.predict(samples)
return embds[0]
def eval_distance(self, embds1, embds2):
dist = distance.cosine(embds1, embds2)
return dist
def img_distance(self, f_img1, f_img2):
embds1 = self.embeddings(f_img1)
embds2 = self.embeddings(f_img2)
dist = self.eval_distance(embds1, embds2)
return dist
def match(self, embds1, embds2):
dist = self.eval_distance(embds1, embds2)
return dist <= self.min_distance
def img_match(self, f_img1, f_img2):
embds1 = self.embeddings(f_img1)
embds2 = self.embeddings(f_img2)
return self.match(embds1, embds2)
def recognize(self, embds, f_db):
minfd = 2.0
indx = -1
f_data = f_db.get_data();
for (i, data) in enumerate(f_data):
(name, embds_i, p_img) = data
dist = self.eval_distance(embds, embds_i)
if (dist<minfd) and (dist<self.min_distance):
indx = i
minfd = dist
if indx>=0:
(name, embds_i, p_img) = f_data[indx]
return (name, minfd, p_img)
return None
def img_recognize(self, f_img, f_db):
embds = self.embeddings(f_img)
return self.recognize(embds, f_db)
The constructor of the class loads the Keras model from a file specified by the model
argument. The min_distance
parameter specifies the minimal distance value required to identify a face image as belonging to a specific person. The embeddings
and eval_distance
methods implement the above functionality. We use the so-called cosine distance for evaluating the similarity of embeddings.
Another method we’d like to mention is recognize
. This method takes as input the embeddings of a detected face and the f_db
object. This object creates a face database as described in this (reference to the part ‘Creating Database’) article.
class FaceDB:
def __init__(self):
self.clear()
def clear(self):
self.f_data = []
def load(self, db_path, rec):
self.clear()
files = FileUtils.get_files(db_path)
for (i, fname) in enumerate(files):
f_img = cv2.imread(fname, cv2.IMREAD_UNCHANGED)
embds = rec.embeddings(f_img)
f = os.path.basename(fname)
p_name = os.path.splitext(f)[0]
data = (p_name, embds, f_img)
self.f_data.append(data)
def get_data(self):
return self.f_data
The load
method of the class searches all files in a folder specified by the db_path
parameter, creates embeddings for all the images it finds, and builds the name-embeddings-image triplets for every person in the database. Using the data from the face database, the recognize
method evaluates the distances between embeddings for all people and the detected face. The minimal distance indicates the most similar person in the database. Note that if this value is greater than the specified threshold, the result will be None
, which indicates an unknown face.
Combining the Face Recognizer with the Face Detector
We can now combine the developed classes with the MTCNN video face detector described in this (reference to the part ‘Face Detection’) article.
class VideoFR:
def __init__(self, detector, rec, f_db):
self.detector = detector
self.rec = rec
self.f_db = f_db
def process(self, video, align=False):
detection_num = 0;
rec_num = 0
capture = cv2.VideoCapture(video)
img = None
dname = 'AI face recognition'
cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
cv2.resizeWindow(dname, 960, 720)
frame_count = 0
dt = 0
if align:
fa = Face_Align_Mouth(160)
while(True):
(ret, frame) = capture.read()
if frame is None:
break
frame_count = frame_count+1
faces = self.detector.detect(frame)
f_count = len(faces)
detection_num += f_count
names = None
if (f_count>0) and (not (self.f_db is None)):
t1 = time.time()
names = [None]*f_count
for (i, face) in enumerate(faces):
if align:
(f_cropped, f_img) = fa.align(frame, face)
else:
(f_cropped, f_img) = self.detector.extract(frame, face)
if (not (f_img is None)) and (not f_img.size==0):
embds = self.rec.embeddings(f_img)
data = self.rec.recognize(embds, self.f_db)
if not (data is None):
rec_num += 1
(name, dist, p_photo) = data
conf = 1.0 - dist
names[i] = (name, conf)
t2 = time.time()
dt = dt + (t2-t1)
if len(faces)>0:
Utils.draw_faces(faces, (0, 0, 255), frame, True, True, names)
cv2.imshow(dname,frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
capture.release()
cv2.destroyAllWindows()
if dt>0:
fps = detection_num/dt
else:
fps = 0
return (detection_num, rec_num, fps)
On initialization, the VideoFR
class receives the MTCNN face detector, face recognizer, and a database of faces. These components are then combined into a face recognition pipeline.
Testing the Recognizer
We can now run recognition on a video file with the following code:
m_file = r"C:\PI_FR\net\facenet_keras.h5"
rec = FaceNetRec(m_file, 0.5)
print("Recognizer loaded.")
print(rec.get_model().inputs)
print(rec.get_model().outputs)
db_path = r"C:\PI_FR\db"
f_db = FaceDB()
f_db.load(db_path, rec)
d = MTCNN_Detector(50, 0.95)
vr = VideoFR(d, rec, f_db)
v_file = r"C:\PI_FR\video\5_3.mp4"
(f_count, rec_count, fps) = vr.process(v_file, True)
print("Face detections: "+str(f_count))
print("Face recognitions: "+str(rec_count))
print("FPS: "+str(fps))
Here is the resulting video for the test we ran.
Note that, when a person is recognized, we draw in the outpt their name, along with the confidence of recognition (similarity score). If a person is not recognized, we draw the confidence of the face detection. As you can see from the above test result, our face recognition algorithm works fine on the video file. It correctly identified two people from the database (Lena and Marat) in most of the frames, and it did not recognize the unknown person throughout the video.
Let’s run a test on two more video snippets.
In the first video, we got 100% correct results. In the second video, the algorithm failed to recognize one of the people: It incorrectly named a person that was not in the database.
This showcases one common problem with face recognition systems. When a person is known (its face is in the database), the system correctly identifies them by assigning the greatest similarity value. However, when the detected person is unknown, the algorithm can still find a person in the database who looks a bit like the detected one.
We can resolve the above issue by changing the min_distance
argument when initializing the recognizer. Note that the confidence for the woman who had been wrongly identified as a known person is never greater than 0.65. So if we set the min value of the similarity (distance) to 0.35, the false identification won’t reoccur.
Next Step
Now we have a good face recognition system. In the next article, we’ll create a Docker container for this system. Stay tuned!