Here we'll use Deep Learning on the tracked faces of the FER+ dataset and attempt to accurately predict a person's emotion from facial points in the browser with TensorFlow.js.
Introduction
Apps like Snapchat offer an amazing variety of face filters and lenses that let you overlay interesting things on your photos and videos. If you’ve ever given yourself virtual dog ears or a party hat, you know how much fun it can be!
Have you wondered how you’d create these kinds of filters from scratch? Well, now’s your chance to learn, all within your web browser! In this series, we’re going to see how to create Snapchat-style filters in the browser, train an AI model to understand facial expressions, and do even more using Tensorflow.js and face tracking.
You are welcome to download the demo of this project. You may need to enable WebGL in your web browser for performance. You can also download the code and files for this series.
We are assuming that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks. If you are new to TensorFlow.js, we recommend that you first check out this guide: Getting Started with Deep Learning in Your Browser Using TensorFlow.js.
If you would like to see more of what is possible in the web browser with TensorFlow.js, check out these AI series: Computer Vision with TensorFlow.js and Chatbots using TensorFlow.js.
In the previous article, we learned how to use AI models to detect the shape of faces. In this one, we’ll use the key facial landmarks to infer more information about the face from the images.
By connecting our face tracking code with the FER facial emotion dataset, we will train a second neural network model to predict the person’s emotion based on several 3D key points.
Setting Up with FER2013 Face Emotion Data
We’ll build on the face tracking code of the previous article to create two web pages. One page will be used to train the AI model with the tracked facial points on the FER dataset, and the other one will load and run the trained model on a test dataset.
Let’s modify the final code from the face tracking project to train and run a neural network model with face data. The FER2013 dataset consists of over 28K labeled face images; it is available on Kaggle. We downloaded this version, which had the dataset already converted to image files, and placed it in the web/fer2013 folder. We then updated the NodeJS server code in index.js to return a reference list of the images at http://localhost:8080/data/, so that you can get the full JSON object if you run the server locally.
To make it a bit easier, we saved this JSON object to the web/fer2013.js file for you to use directly, without the need to run the server locally. You can include it with the other script files at the top of the page:
<script src="web/fer2013.js"></script>
We are going to work with images rather than the webcam video (don’t worry, we will bring video back in the next article [11] !), so we need to replace the <video>
element with the <img>
element, and can rename its ID to “image.” We can also remove the setupWebcam
function because we do not need it for this project.
<img id="image" style="
visibility: hidden;
width: auto;
height: auto;
"/>
Next, let’s add a utility function to set the image for the element, and another one to shuffle the data array. Because the original images are only 48x48 pixels, let’s define a larger output size of 500 pixels to get more granular face tracking and be able to see the result in a bigger canvas, and let’s update the line and polygon utility functions to scale to the output.
async function setImage( url ) {
return new Promise( res => {
let image = document.getElementById( "image" );
image.src = url;
image.onload = () => {
res();
};
});
}
function shuffleArray( array ) {
for( let i = array.length - 1; i > 0; i-- ) {
const j = Math.floor( Math.random() * ( i + 1 ) );
[ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
}
}
const OUTPUT_SIZE = 500;
Some global variables we will need are the list of emotion categories, an aggregated array list of FER data, and an index for the array:
const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let ferData = [];
let setIndex = 0;
inside the async block, we can prepare and shuffle the FER data and resize the canvas to 500x500 pixels:
const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
Object.keys( fer2013 ).forEach( em => {
shuffleArray( fer2013[ em ] );
for( let i = 0; i < minSamples; i++ ) {
ferData.push({
emotion: em,
file: fer2013[ em ][ i ]
});
}
});
shuffleArray( ferData );
let canvas = document.getElementById( "output" );
canvas.width = OUTPUT_SIZE;
canvas.height = OUTPUT_SIZE;
There is one last update we need in the code template before training the AI model on one page and running the trained model on the second page. We have to update the trackFace
function to work with the image element instead of the video, as well as scale the bounding box and face mesh output to match the canvas size. We’ll increment setIndex
at the end of the function to move onto the next image.
async function trackFace() {
await setImage( ferData[ setIndex ].file );
const image = document.getElementById( "image" );
const faces = await model.estimateFaces( {
input: image,
returnTensors: false,
flipHorizontal: false,
});
output.drawImage(
image,
0, 0, image.width, image.height,
0, 0, OUTPUT_SIZE, OUTPUT_SIZE
);
const scale = OUTPUT_SIZE / image.width;
faces.forEach( face => {
const x1 = face.boundingBox.topLeft[ 0 ];
const y1 = face.boundingBox.topLeft[ 1 ];
const x2 = face.boundingBox.bottomRight[ 0 ];
const y2 = face.boundingBox.bottomRight[ 1 ];
const bWidth = x2 - x1;
const bHeight = y2 - y1;
drawLine( output, x1, y1, x2, y1, scale );
drawLine( output, x2, y1, x2, y2, scale );
drawLine( output, x1, y2, x2, y2, scale );
drawLine( output, x1, y1, x1, y2, scale );
const keypoints = face.scaledMesh;
for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
}
});
setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
setIndex++;
requestAnimationFrame( trackFace );
}
Now our modified template is ready. Create two copies of this code so that we can set one page for Deep Learning and the other page for testing.
Part 1: Deep Learning Facial Emotion
In this first web page file, we are going to set up the training data, create the neural network model, and then train it and save the weights to a file. The pretrained model is included in the code (see the web/model folder), so you can skip this part and move ahead to Part 2 if you wish.
Add a global variable to store the training data and a utility function to convert an emotion label to a one-hot vector so we can use it for the training data:
let trainingData = [];
function emotionToArray( emotion ) {
let array = [];
for( let i = 0; i < emotions.length; i++ ) {
array.push( emotion === emotions[ i ] ? 1 : 0 );
}
return array;
}
Inside the trackFace
function, we’ll take the various key facial features, scale them relative to the size of the bounding box, and add them into the training dataset if the face tracking confidence value is high enough. We’ve commented out some of the additional facial features to simplify the data, but you can add them back in if you would like to experiment. If you do so, remember to match these same features when running the model.
const features = [
"noseTip",
"leftCheek",
"rightCheek",
"leftEyeLower1", "leftEyeUpper1",
"rightEyeLower1", "rightEyeUpper1",
"leftEyebrowLower",
"rightEyebrowLower",
"lipsLowerInner",
"lipsUpperInner",
];
let points = [];
features.forEach( feature => {
face.annotations[ feature ].forEach( x => {
points.push( ( x[ 0 ] - x1 ) / bWidth );
points.push( ( x[ 1 ] - y1 ) / bHeight );
});
});
if( face.faceInViewConfidence > 0.9 ) {
trainingData.push({
input: points,
output: ferData[ setIndex ].emotion,
});
}
Once we have compiled enough training data, we can pass it off to the trainNet
function. At the top of the trackFace
function, let’s finish and break out of the face tracking loop after 200 images and call the training function:
async function trackFace() {
if( setIndex >= 200 ) {
setText( "Finished!" );
trainNet();
return;
}
...
}
Finally the part we have been waiting for: let’s create the trainNet
function and train our AI model!
This function will split the training data into an input array of the key points and an output array of the emotion one-hot vectors, create a categorical TensorFlow model with multiple hidden layers, train for 1,000 epochs, and download the trained model. You can increase the number of epochs if you would like to train the model more.
async function trainNet() {
let inputs = trainingData.map( x => x.input );
let outputs = trainingData.map( x => emotionToArray( x.output ) );
const model = tf.sequential();
model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
model.add(tf.layers.dense( {
units: emotions.length,
kernelInitializer: 'varianceScaling',
useBias: false,
activation: "softmax"
} ) );
model.compile({
optimizer: "adam",
loss: "categoricalCrossentropy",
metrics: "acc"
});
const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
await model.fit( xs, ys, {
epochs: 1000,
shuffle: true,
callbacks: {
onEpochEnd: ( epoch, logs ) => {
setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
console.log( "Epoch #", epoch, logs );
}
}
} );
const saveResult = await model.save( "downloads://facemo" );
}
And that’s it! This web page will train an AI model to recognize facial expressions in the various categories and give you a model to load and run, which we will do next.
Part 1: Finish Line
Here is what the full code for training the model with the FER dataset:
<html>
<head>
<title>Training - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
<script src="web/triangles.js"></script>
<script src="web/fer2013.js"></script>
</head>
<body>
<canvas id="output"></canvas>
<img id="image" style="
visibility: hidden;
width: auto;
height: auto;
"/>
<h1 id="status">Loading...</h1>
<script>
function setText( text ) {
document.getElementById( "status" ).innerText = text;
}
async function setImage( url ) {
return new Promise( res => {
let image = document.getElementById( "image" );
image.src = url;
image.onload = () => {
res();
};
});
}
function shuffleArray( array ) {
for( let i = array.length - 1; i > 0; i-- ) {
const j = Math.floor( Math.random() * ( i + 1 ) );
[ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
}
}
function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
ctx.beginPath();
ctx.moveTo( x1 * scale, y1 * scale );
ctx.lineTo( x2 * scale, y2 * scale );
ctx.stroke();
}
function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
ctx.beginPath();
ctx.moveTo( x1 * scale, y1 * scale );
ctx.lineTo( x2 * scale, y2 * scale );
ctx.lineTo( x3 * scale, y3 * scale );
ctx.lineTo( x1 * scale, y1 * scale );
ctx.stroke();
}
const OUTPUT_SIZE = 500;
const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let ferData = [];
let setIndex = 0;
let trainingData = [];
let output = null;
let model = null;
function emotionToArray( emotion ) {
let array = [];
for( let i = 0; i < emotions.length; i++ ) {
array.push( emotion === emotions[ i ] ? 1 : 0 );
}
return array;
}
async function trainNet() {
let inputs = trainingData.map( x => x.input );
let outputs = trainingData.map( x => emotionToArray( x.output ) );
const model = tf.sequential();
model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
model.add(tf.layers.dense( {
units: emotions.length,
kernelInitializer: 'varianceScaling',
useBias: false,
activation: "softmax"
} ) );
model.compile({
optimizer: "adam",
loss: "categoricalCrossentropy",
metrics: "acc"
});
const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
await model.fit( xs, ys, {
epochs: 1000,
shuffle: true,
callbacks: {
onEpochEnd: ( epoch, logs ) => {
setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
console.log( "Epoch #", epoch, logs );
}
}
} );
const saveResult = await model.save( "downloads://facemo" );
}
async function trackFace() {
if( setIndex >= 200 ) {
setText( "Finished!" );
trainNet();
return;
}
await setImage( ferData[ setIndex ].file );
const image = document.getElementById( "image" );
const faces = await model.estimateFaces( {
input: image,
returnTensors: false,
flipHorizontal: false,
});
output.drawImage(
image,
0, 0, image.width, image.height,
0, 0, OUTPUT_SIZE, OUTPUT_SIZE
);
const scale = OUTPUT_SIZE / image.width;
faces.forEach( face => {
const x1 = face.boundingBox.topLeft[ 0 ];
const y1 = face.boundingBox.topLeft[ 1 ];
const x2 = face.boundingBox.bottomRight[ 0 ];
const y2 = face.boundingBox.bottomRight[ 1 ];
const bWidth = x2 - x1;
const bHeight = y2 - y1;
drawLine( output, x1, y1, x2, y1, scale );
drawLine( output, x2, y1, x2, y2, scale );
drawLine( output, x1, y2, x2, y2, scale );
drawLine( output, x1, y1, x1, y2, scale );
const keypoints = face.scaledMesh;
for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
}
const features = [
"noseTip",
"leftCheek",
"rightCheek",
"leftEyeLower1", "leftEyeUpper1",
"rightEyeLower1", "rightEyeUpper1",
"leftEyebrowLower",
"rightEyebrowLower",
"lipsLowerInner",
"lipsUpperInner",
];
let points = [];
features.forEach( feature => {
face.annotations[ feature ].forEach( x => {
points.push( ( x[ 0 ] - x1 ) / bWidth );
points.push( ( x[ 1 ] - y1 ) / bHeight );
});
});
if( face.faceInViewConfidence > 0.9 ) {
trainingData.push({
input: points,
output: ferData[ setIndex ].emotion,
});
}
});
setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
setIndex++;
requestAnimationFrame( trackFace );
}
(async () => {
const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
Object.keys( fer2013 ).forEach( em => {
shuffleArray( fer2013[ em ] );
for( let i = 0; i < minSamples; i++ ) {
ferData.push({
emotion: em,
file: fer2013[ em ][ i ]
});
}
});
shuffleArray( ferData );
let canvas = document.getElementById( "output" );
canvas.width = OUTPUT_SIZE;
canvas.height = OUTPUT_SIZE;
output = canvas.getContext( "2d" );
output.translate( canvas.width, 0 );
output.scale( -1, 1 );
output.fillStyle = "#fdffb6";
output.strokeStyle = "#fdffb6";
output.lineWidth = 2;
model = await faceLandmarksDetection.load(
faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
);
setText( "Loaded!" );
trackFace();
})();
</script>
</body>
</html>
Part 2: Running Facial Emotion Detection
We are almost there. Running the emotion detector model is simpler than training it. In this web page, we are going to load the trained TensorFlow model and test it on random faces from the FER dataset.
We can load the emotion detection model right below the Face Landmarks Detection model loading code, in a global variable. If you have trained your own model in Part 1, you can update the path to match the location where you saved your model.
let emotionModel = null;
(async () => {
...
model = await faceLandmarksDetection.load(
faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
);
emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );
...
})();
After that, we can write a function to run the model on an input of the key facial points and return the detected emotion:
async function predictEmotion( points ) {
let result = tf.tidy( () => {
const xs = tf.stack( [ tf.tensor1d( points ) ] );
return emotionModel.predict( xs );
});
let prediction = await result.data();
result.dispose();
let id = prediction.indexOf( Math.max( ...prediction ) );
return emotions[ id ];
}
Just so that we can wait a few seconds between test images, let’s create a wait
utility function:
function wait( ms ) {
return new Promise( res => setTimeout( res, ms ) );
}
Now to put it into action, we can take the key points of the tracked face, scale it relative to the bounding box to prepare as input, run the emotion prediction, and show the expected vs. detected result, with 2 seconds between images.
async function trackFace() {
...
let points = null;
faces.forEach( face => {
...
const features = [
"noseTip",
"leftCheek",
"rightCheek",
"leftEyeLower1", "leftEyeUpper1",
"rightEyeLower1", "rightEyeUpper1",
"leftEyebrowLower",
"rightEyebrowLower",
"lipsLowerInner",
"lipsUpperInner",
];
points = [];
features.forEach( feature => {
face.annotations[ feature ].forEach( x => {
points.push( ( x[ 0 ] - x1 ) / bWidth );
points.push( ( x[ 1 ] - y1 ) / bHeight );
});
});
});
if( points ) {
let emotion = await predictEmotion( points );
setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
}
else {
setText( "No Face" );
}
setIndex++;
await wait( 2000 );
requestAnimationFrame( trackFace );
}
It’s ready! Our code should start predicting the emotions of the FER images to match the expected emotion. Try it and see how it performs.
Part 2: Finish Line
Take a look at the full code to run the trained model on the FER dataset images:
<html>
<head>
<title>Running - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
<script src="web/fer2013.js"></script>
</head>
<body>
<canvas id="output"></canvas>
<img id="image" style="
visibility: hidden;
width: auto;
height: auto;
"/>
<h1 id="status">Loading...</h1>
<script>
function setText( text ) {
document.getElementById( "status" ).innerText = text;
}
async function setImage( url ) {
return new Promise( res => {
let image = document.getElementById( "image" );
image.src = url;
image.onload = () => {
res();
};
});
}
function shuffleArray( array ) {
for( let i = array.length - 1; i > 0; i-- ) {
const j = Math.floor( Math.random() * ( i + 1 ) );
[ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
}
}
function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
ctx.beginPath();
ctx.moveTo( x1 * scale, y1 * scale );
ctx.lineTo( x2 * scale, y2 * scale );
ctx.stroke();
}
function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
ctx.beginPath();
ctx.moveTo( x1 * scale, y1 * scale );
ctx.lineTo( x2 * scale, y2 * scale );
ctx.lineTo( x3 * scale, y3 * scale );
ctx.lineTo( x1 * scale, y1 * scale );
ctx.stroke();
}
function wait( ms ) {
return new Promise( res => setTimeout( res, ms ) );
}
const OUTPUT_SIZE = 500;
const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let ferData = [];
let setIndex = 0;
let emotionModel = null;
let output = null;
let model = null;
async function predictEmotion( points ) {
let result = tf.tidy( () => {
const xs = tf.stack( [ tf.tensor1d( points ) ] );
return emotionModel.predict( xs );
});
let prediction = await result.data();
result.dispose();
let id = prediction.indexOf( Math.max( ...prediction ) );
return emotions[ id ];
}
async function trackFace() {
await setImage( ferData[ setIndex ].file );
const image = document.getElementById( "image" );
const faces = await model.estimateFaces( {
input: image,
returnTensors: false,
flipHorizontal: false,
});
output.drawImage(
image,
0, 0, image.width, image.height,
0, 0, OUTPUT_SIZE, OUTPUT_SIZE
);
const scale = OUTPUT_SIZE / image.width;
let points = null;
faces.forEach( face => {
const x1 = face.boundingBox.topLeft[ 0 ];
const y1 = face.boundingBox.topLeft[ 1 ];
const x2 = face.boundingBox.bottomRight[ 0 ];
const y2 = face.boundingBox.bottomRight[ 1 ];
const bWidth = x2 - x1;
const bHeight = y2 - y1;
drawLine( output, x1, y1, x2, y1, scale );
drawLine( output, x2, y1, x2, y2, scale );
drawLine( output, x1, y2, x2, y2, scale );
drawLine( output, x1, y1, x1, y2, scale );
const features = [
"noseTip",
"leftCheek",
"rightCheek",
"leftEyeLower1", "leftEyeUpper1",
"rightEyeLower1", "rightEyeUpper1",
"leftEyebrowLower",
"rightEyebrowLower",
"lipsLowerInner",
"lipsUpperInner",
];
points = [];
features.forEach( feature => {
face.annotations[ feature ].forEach( x => {
points.push( ( x[ 0 ] - x1 ) / bWidth );
points.push( ( x[ 1 ] - y1 ) / bHeight );
});
});
});
if( points ) {
let emotion = await predictEmotion( points );
setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
}
else {
setText( "No Face" );
}
setIndex++;
await wait( 2000 );
requestAnimationFrame( trackFace );
}
(async () => {
const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
Object.keys( fer2013 ).forEach( em => {
shuffleArray( fer2013[ em ] );
for( let i = 0; i < minSamples; i++ ) {
ferData.push({
emotion: em,
file: fer2013[ em ][ i ]
});
}
});
shuffleArray( ferData );
let canvas = document.getElementById( "output" );
canvas.width = OUTPUT_SIZE;
canvas.height = OUTPUT_SIZE;
output = canvas.getContext( "2d" );
output.translate( canvas.width, 0 );
output.scale( -1, 1 );
output.fillStyle = "#fdffb6";
output.strokeStyle = "#fdffb6";
output.lineWidth = 2;
model = await faceLandmarksDetection.load(
faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
);
emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );
setText( "Loaded!" );
trackFace();
})();
</script>
</body>
</html>
What’s Next? Can This Detect Our Own Facial Emotions?
In this article, we combined the output of the TensorFlow Face Landmarks Detection model with an independent dataset to generate a new model, which can predict more information from an image than before. The real test would be to have this new model predict emotions on any face.
Let’s go to the next article of this series, in which we’ll use the live webcam video of our face and see if the model can react to our facial expressions in real time.