Implementing Kinect Gestures

Vangos Pterneas

5.00/5 (3 votes)

28 Jan 2014CPOL5 min read

45.9K

How to implement Kinect gestures

Download Vitruvius from GitHub

Original post: http://pterneas.com/2014/01/27/implementing-kinect-gestures/

Gesture recognition is a fundamental element when developing Kinect-based applications (or any other Natural User Interfaces). Gestures are used for navigation, interaction or data input. The most common gesture examples include waving, sweeping, zooming, joining hands, and much more. Unfortunately, the current Kinect for Windows SDK does not include a gesture-detection mechanism out of the box. So, you thought that recognizing gestures using Kinect is a pain in the ass? Not any more. Today, I’ll show you how you can implement your own gestures using some really easy techniques. There is no need to be a Math guru or an Artificial Intelligence Yoda to build a simple gesture detection mechanism.

Download the source code example, a simple wave gesture

Prerequisites

Kinect for Windows or Kinect for XBOX sensor
Microsoft Kinect SDK (or OpenNI SDK with minor modifications)

What is a Gesture?

Before implementing something, it is always good to define it. Kinect provides you with the position (X, Y and Z) of the users’ joints 30 times (or frames) per second. If some specific points move to specific relative positions for a given amount of time, then you have a gesture. So, in terms of Kinect, a gesture is the relative position of some joints for a given number of frames. Let’s take the wave gesture as an example. People wave by raising their left or right hand and moving it from side to side. Throughout the gesture, the hand usually remains above the elbow and moves periodically from left to right. Here is a graphical representation of the movement:

Kinect wave gesture

Now that you’ve seen and understood what a gesture is, let’s try to specify its underlying algorithm.

Gesture Segments

In the wave gesture, the hand remains above the elbow and moves periodically from left to right. Each position (left / right) is a discrete part of the gesture. Formally, these parts are called segments.

So, the first segment would contain the conditions “hand above elbow” and “hand right of elbow”:

Hand.Position.Y > Elbow.Position.Y AND
Hand.Position.X > Elbow.Position.X

Similarly, the second segment would contain the conditions “hand above elbow” and “hand left of elbow”:

Hand.Position.Y > Elbow.Position.Y AND
Hand.Position.X < Elbow.Position.X

That’s it. If you notice any consecutive repeats of the above segments for at least three or four times, then the user is waving! In .NET, the source code would be really simple; just two classes representing each segment. Of course, each segment class should implement an Update method. The Update method determines whether the specified conditions are met for a given skeleton body. Returns Succeeded if every condition of the segment is met, or Failed if none of the conditions is met.

// WaveGestureSegments.cs
using Microsoft.Kinect;

namespace KinectSimpleGesture
{
    public interface IGestureSegment
    {
        GesturePartResult Update(Skeleton skeleton);
    }

    public class WaveSegment1 : IGestureSegment
    {
        public GesturePartResult Update(Skeleton skeleton)
        {
            // Hand above elbow
            if (skeleton.Joints[JointType.HandRight].Position.Y > 
                skeleton.Joints[JointType.ElbowRight].Position.Y)
            {
                // Hand right of elbow
                if (skeleton.Joints[JointType.HandRight].Position.X > 
                    skeleton.Joints[JointType.ElbowRight].Position.X)
                {
                    return GesturePartResult.Succeeded;
                }
            }

            // Hand dropped
            return GesturePartResult.Failed;
        }
    }

    public class WaveSegment2 : IGestureSegment
    {
        public GesturePartResult Update(Skeleton skeleton)
        {
            // Hand above elbow
            if (skeleton.Joints[JointType.HandRight].Position.Y > 
                skeleton.Joints[JointType.ElbowRight].Position.Y)
            {
                // Hand left of elbow
                if (skeleton.Joints[JointType.HandRight].Position.X < 
                    skeleton.Joints[JointType.ElbowRight].Position.X)
                {
                    return GesturePartResult.Succeeded;
                }
            }

            // Hand dropped
            return GesturePartResult.Failed;
        }
    }
}

The GesturePartResult is an enum (we could even use boolean values):

// GesturePartResult.cs
using System;

namespace KinectSimpleGesture
{
    public enum GesturePartResult
    {
        Failed,
        Succeeded
    }
}

Note: For a more advanced example, we could use another GesturePartResult (lets say “Undetermined”), which would indicate that we are not sure about the current gesture result.

Updating the Gesture

We now need a way to update and check the gesture every time the sensor provides us with new skeleton/body data. This kind of check will be done in a separate class and will be called 30 times per second, or at least as many times as our Kinect sensor allows. When updating a gesture, we check each segment and specify whether the movement is complete or whether we need to continue asking for data.

Window Size

The number of frames we ask for data is called window size and you find it after experimenting with your code. For simple gestures that last for approximately a second, a window size of 30 or 50 will do the job just fine. For the wave gesture, I chose 50.

The Gesture Class

Having decided on the window size parameter, we can now build the WaveGesture class. Notice the process:

In the constructor, we create the gesture parts and we specify their order in the _segments array. You can use as many occurrences of each segment as you like!
In the Update method, we keep track of the frame index and check every segment for success or failure.
If we succeed, we throw the GestureRecognized event and reset the gesture.
If we fail or the window size has been reached, we reset the gesture and start over.

Here is the final class for our wave gesture:

// WaveGesture.cs
using Microsoft.Kinect;
using System;

namespace KinectSimpleGesture
{
    public class WaveGesture
    {
        readonly int WINDOW_SIZE = 50;

        IGestureSegment[] _segments;

        int _currentSegment = 0;
        int _frameCount = 0;

        public event EventHandler GestureRecognized;

        public WaveGesture()
        {
            WaveSegment1 waveSegment1 = new WaveSegment1();
            WaveSegment2 waveSegment2 = new WaveSegment2();

            _segments = new IGestureSegment[]
            {
                waveSegment1,
                waveSegment2,
                waveSegment1,
                waveSegment2,
                waveSegment1,
                waveSegment2
            };
        }

        public void Update(Skeleton skeleton)
        {
            GesturePartResult result = _segments[_currentSegment].Update(skeleton);

            if (result == GesturePartResult.Succeeded)
            {
                if (_currentSegment + 1 < _segments.Length)
                {
                    _currentSegment++;
                    _frameCount = 0;
                }
                else
                {
                    if (GestureRecognized != null)
                    {
                        GestureRecognized(this, new EventArgs());
                        Reset();
                    }
                }
            }
            else if (result == GesturePartResult.Failed || _frameCount == WINDOW_SIZE)
            {
                Reset();
            }
            else
            {
                _frameCount++;
            }
        }

        public void Reset()
        {
            _currentSegment = 0;
            _frameCount = 0;
        }
    }
}

Using the Code

Using the code we created is straightforward. Create an instance of the WaveGesture class inside your program and subscribe to the GestureRecognized event. Remember to call the Update method whenever you have a new Skeleton frame. Here is a complete Console app example:

using Microsoft.Kinect;
using System;

namespace KinectSimpleGesture
{
    class Program
    {
        static WaveGesture _gesture = new WaveGesture();

        static void Main(string[] args)
        {
            var sensor = KinectSensor.KinectSensors.Where(
                         s => s.Status == KinectStatus.Connected).FirstOrDefault();

            if (sensor != null)
            {
                sensor.SkeletonStream.Enable();
                sensor.SkeletonFrameReady += Sensor_SkeletonFrameReady;

                _gesture.GestureRecognized += Gesture_GestureRecognized;

                sensor.Start();
            }

            Console.ReadKey();
        }

        static void Sensor_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
        {
            using (var frame = e.OpenSkeletonFrame())
            {
                if (frame != null)
                {
                    Skeleton[] skeletons = new Skeleton[frame.SkeletonArrayLength];

                    frame.CopySkeletonDataTo(skeletons);

                    if (skeletons.Length > 0)
                    {
                        var user = skeletons.Where(
                                   u => u.TrackingState == 
                                        SkeletonTrackingState.Tracked).FirstOrDefault();

                        if (user != null)
                        {
                            _gesture.Update(user);
                        }
                    }
                }
            }
        }

        static void Gesture_GestureRecognized(object sender, EventArgs e)
        {
            Console.WriteLine("You just waved!");
        }
    }
}

That’s it! Now stand in front of your Kinect sensor and wave using your right hand!

Download the source code example

Something to Note

Obviously, you cannot expect your users to do everything right. One might wave but not perform the entire movement. Another might just perform the movement too quickly or too slowly. When developing a business app targeting the Kinect platform, you have to be aware of all these issues and add conditions to your code. In a common situation, you’ll need to specify whether the user is “almost” performing a gesture. That is, you’ll need to bypass a number of frames before determining the final gesture result. This is why I mentioned the Undetermined statement before.

Vitruvius

So, if you want more production-ready gestures right now, consider downloading Vitruvius. Vitruvius is a free & open-source library I built, which provides many utilities for your Kinect applications. It currently supports 9 gestures, waiting for more to come. The code is more generic and you can easily build your own extensions on top of it. Give it a try, enjoy and even contribute yourself!

Download Vitruvius from GitHub

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)