Face Touch Detection with TensorFlow.js Part 2: Using BodyPix

Raphael Mun

5.00/5 (2 votes)

14 Jul 2020CPOL3 min read

8.5K

In this article, we are going to use BodyPix, a body part detection and segmentation library, to try and remove the training step of the face touch detection.

Here we look at: Setting up BodyPix, detecting face touches, how I wrote my predictImage() function from the starting point template, using the distance formula to check for face region overlap, and how we can use BodyPix to estimate a person’s body poses.

Download TensorFlowJS Examples - 6.1 MB

TensorFlow + JavaScript. The most popular, cutting-edge AI framework now supports the most widely used programming language on the planet, so let’s make magic happen through deep learning right in our web browser, GPU-accelerated via WebGL using TensorFlow.js!

In the previous paper, we trained an AI with TensorFlow.js to simulate the donottouchyourface.com app, which was designed to help people reduce the risk of getting sick by learning to stop touching their face. In this article, we are going to use BodyPix, a body part detection and segmentation library, to try and remove the training step of the face touch detection.

Starting Point

For this project, we need to:

Import TensorFlow.js and BodyPix
Add the video element
Add a canvas for debugging
Add a text element for Touch vs No Touch status
Add the webcam setup functionality
Run the model prediction every 200 ms instead of picking an image, but only after the model has trained for the first time

Here is our starting point:

JavaScript

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 2: Using BodyPix</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-pix@2.0"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <canvas id="canvas" width="224" height="224"></canvas>
        <h1 id="status">Loading...</h1>
        <script>
        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( 'loadeddata', resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        (async () => {
            await setupWebcam();

            setInterval( predictImage, 200 );
        })();

        async function predictImage() {
            // Prediction Code Goes Here
        }
        </script>
    </body>
</html>

Setting Up BodyPix

BodyPix takes several parameters when loading – you might recognize some of them. It supports two different pre-trained models for its architecture: MobileNetV1 and ResNet50. The required parameters may vary depending on the model you chose. We will use MobileNet for and initialize BodyPix with the following code:

JavaScript

(async () => {
    model = await bodyPix.load({
        architecture: 'MobileNetV1',
        outputStride: 16,
        multiplier: 0.50,
        quantBytes: 2
    });
    await setupWebcam();
    setInterval( predictImage, 200 );
})();

Detecting Face Touches

With body part segmentation, we get two pieces of data from BodyPix:

Key points of body parts, such as nose, ears, wrist, elbow, etc., represented in 2-D screen pixel coordinates
The 2-D segmentation pixel data stored in a 1-D array format

After brief testing, I found that the key point coordinates retrieved for the nose and ears were fairly reliable while the points for a person’s wrists were not accurate enough to determine whether a hand is touching the face. Therefore, we will use the segmentation pixels to determine face touch.

Because the nose and ears key points seem reliable, we can use them to estimate a circle region for the person’s face. Using this circle region, we can determine if any left-hand or right-hand segmentation pixels overlap the area – and mark the status as a face touch.

Here’s how I wrote my predictImage() function from the starting point template, using the distance formula to check for face region overlap:

Python

async function predictImage() {
    const img = document.getElementById( "webcam" );
    const segmentation = await model.segmentPersonParts( img );
    if( segmentation.allPoses.length > 0 ) {
        const keypoints = segmentation.allPoses[ 0 ].keypoints;
        const nose = keypoints[ 0 ].position;
        const earL = keypoints[ 3 ].position;
        const earR = keypoints[ 4 ].position;
        const earLtoNose = Math.sqrt( Math.pow( nose.x - earL.x, 2 ) + Math.pow( nose.y - earL.y, 2 ) );
        const earRtoNose = Math.sqrt( Math.pow( nose.x - earR.x, 2 ) + Math.pow( nose.y - earR.y, 2 ) );
        const faceRadius = Math.max( earLtoNose, earRtoNose );

        // Check if any of the left_hand(10) or right_hand(11) pixels are within the nose to faceRadius
        let isTouchingFace = false;
        for( let y = 0; y < 224; y++ ) {
            for( let x = 0; x < 224; x++ ) {
                if( segmentation.data[ y * 224 + x ] === 10 ||
                    segmentation.data[ y * 224 + x ] === 11 ) {
                    const distToNose = Math.sqrt( Math.pow( nose.x - x, 2 ) + Math.pow( nose.y - y, 2 ) );
                    // console.log( distToNose );
                    if( distToNose < faceRadius ) {
                        isTouchingFace = true;
                        break;
                    }
                }
            }
            if( isTouchingFace ) {
                break;
            }
        }
        if( isTouchingFace ) {
            document.getElementById( "status" ).innerText = "Touch";
        }
        else {
            document.getElementById( "status" ).innerText = "Not Touch";
        }

        // --- Uncomment the following to view the BodyPix mask ---
        // const canvas = document.getElementById( "canvas" );
        // bodyPix.drawMask(
        //     canvas, img,
        //     bodyPix.toColoredPartMask( segmentation ),
        //     0.7,
        //     0,
        //     false
        // );
    }
}

If you would like to see the pixels predicted by BodyPix, you can uncomment the bottom section of the function.

My approach to predictImage() is a very rough estimate that uses the hand pixel’s proximity. A fun challenge for you might be to find a more accurate way to detect when a person’s hand has touched the face!

Technical Footnotes

One advantage of using BodyPix for Face Touch Detection is that the user does not need to train an AI with examples of the undesired behavior
Another advantage of BodyPix is that it can segment the face in front when the person’s hand is hidden behind it.
This approach and prediction are more specific to recognizing a Face Touch action than what we used in the previous article; however, the first approach may result in more accurate predictions given enough sample data
Expect performance issues as BodyPix is computationally expensive

Finish Line

For your reference, here is the full code for this project:

JavaScript

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 2: Using BodyPix</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-pix@2.0"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <canvas id="canvas" width="224" height="224"></canvas>
        <h1 id="status">Loading...</h1>
        <script>
        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( 'loadeddata', resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        let model = null;

        (async () => {
            model = await bodyPix.load({
                architecture: 'MobileNetV1',
                outputStride: 16,
                multiplier: 0.50,
                quantBytes: 2
            });
            await setupWebcam();
            setInterval( predictImage, 200 );
        })();

        async function predictImage() {
            const img = document.getElementById( "webcam" );
            const segmentation = await model.segmentPersonParts( img );
            if( segmentation.allPoses.length > 0 ) {
                const keypoints = segmentation.allPoses[ 0 ].keypoints;
                const nose = keypoints[ 0 ].position;
                const earL = keypoints[ 3 ].position;
                const earR = keypoints[ 4 ].position;
                const earLtoNose = Math.sqrt( Math.pow( nose.x - earL.x, 2 ) + Math.pow( nose.y - earL.y, 2 ) );
                const earRtoNose = Math.sqrt( Math.pow( nose.x - earR.x, 2 ) + Math.pow( nose.y - earR.y, 2 ) );
                const faceRadius = Math.max( earLtoNose, earRtoNose );

                // Check if any of the left_hand(10) or right_hand(11) pixels are within the nose to faceRadius
                let isTouchingFace = false;
                for( let y = 0; y < 224; y++ ) {
                    for( let x = 0; x < 224; x++ ) {
                        if( segmentation.data[ y * 224 + x ] === 10 ||
                            segmentation.data[ y * 224 + x ] === 11 ) {
                            const distToNose = Math.sqrt( Math.pow( nose.x - x, 2 ) + Math.pow( nose.y - y, 2 ) );
                            // console.log( distToNose );
                            if( distToNose < faceRadius ) {
                                isTouchingFace = true;
                                break;
                            }
                        }
                    }
                    if( isTouchingFace ) {
                        break;
                    }
                }
                if( isTouchingFace ) {
                    document.getElementById( "status" ).innerText = "Touch";
                }
                else {
                    document.getElementById( "status" ).innerText = "Not Touch";
                }

                // --- Uncomment the following to view the BodyPix mask ---
                // const canvas = document.getElementById( "canvas" );
                // bodyPix.drawMask(
                //     canvas, img,
                //     bodyPix.toColoredPartMask( segmentation ),
                //     0.7,
                //     0,
                //     false
                // );
            }
        }
        </script>
    </body>
</html>

What’s Next? Can We Do Even More With TensorFlow.js?

In this project, we saw how easily we can use BodyPix to estimate a person’s body poses. For the next project, let’s revisit the webcam transfer learning and have a bit of fun with it.

Follow along with the next article in this series to see if we can train an AI to deep-learn some hand gestures and sign language.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)