Home Automation with Microsoft Kinect Point Cloud and Speech Recognition

Dan Thyer

4.95/5 (55 votes)

10 Jan 2015CPOL12 min read

172.3K

2.7K

Use a Microsoft Kinect to control the home automation in the house. Lights can be turned on an off from speech recognition or from pointing at them and waving your other hand one way to turn on and the other way to turn off.

Introduction

I love using the Microsoft Kinect for my home automation projects. The Microsoft Kinect for Windows API is really amazing and can be used to track our movement in the physical world in unique and creative ways outside of the traditional game controller. Traditional computer vision systems are too slow to track normal human motion, but the Kinect is able to give you coordinates of 20 joints 30 times a second. The Kinect is able to simplify the computer vision problem by creating what is called a Point Cloud out of infrared light. This infrared light is similar to visible light but has a longer wavelength than what we can see. The Point Cloud is able to be seen with a special camera or night vision goggles as shown in the image below.

The Kinect has a special lens that sends out a known pattern of spaced lines of infrared light. The light makes dots on the objects it touches creating a Point Cloud. The Kinect has a special camera for seeing the infrared dots. The vision system on the Kinect measures the distance between the dots and analyses the displacement in the pattern to know how far away an object is. See the image below to see how close up objects have dots closer together and further objects have dots spaced further apart. The Kinect is able to analyze the spacing of the infrared dots to build a depth map and quickly see a human outline because the human is in front of other objects.

Create a Natural UI with the Kinect

There are some great user interfaces with the Kinect but most require you to be looking at a computer screen. I built a system that does not require you to look at a computer in order to select a device and turn it on or off. You can simply point to a device with one hand and raise your other hand above your head and wave one direction to turn on and the other direction to turn off. In addition to using gestures, I use the Kinect speech recognition engine to turn devices on or off.

Click here to see it all work!

Vectors

Vectors are a cornerstone in mathematics and physics and they represent a direction and a magnitude (also called a length). They are fundamental to 3D programming and are used extensively in building 3D models for computer games or engineering applications.

Vector3D is a structure in the System.Windows.Media.Media3D namespace. This namespace contains types and structures that support 3D presentation in Windows Presentation Foundation (WPF) applications. The Vector3D structure was built for WPF applications but is also very useful in helping to process other 3D vector data including vector data from the Kinect. The Microsoft.Kinect namespace has a Vector4 structure. This is similar to the Vector3D structure from WPF but includes a property called W in addition to the standard X, Y and Z properties of Vector3D. The additional W property is the fourth dimension in Vector4 and is used for rotations in 3D space around the axis defined by the vector in a number system called Quaternions. I use the Vector3D from WPF library over the Vector4 from the Kinect library because for this project I have no need to rotate anything in 3D space and the Vector3D has useful methods built in for calculating the dot product and cross product and also has a length property.

I further extend the Vector3D functionality with my Line class. In mathematics, a line can be represented by a point and a vector that passes through the point. It takes 2 points to define a line and so the constructor has arguments for 2 SkeletonPoints from the Kinect.

public Line(SkeletonPoint Point1, SkeletonPoint Point2)
{
    _point.X = Point1.X;
    _point.Y = Point1.Y;
    _point.Z = Point1.Z;
 
    _vector.X = Point2.X - Point1.X;
    _vector.Y = Point2.Y - Point1.Y;
    _vector.Z = Point2.Z - Point1.Z;
}

The method below is in the KinectLiving class. It determines which hand is pointing to the device based on a gesture from the other hand and uses the position of the elbow and hand to return a line that the user is pointing to the device.

internal static Line GetLinePointingToDevice(Skeleton skeleton, Gesture gesture)
{
    if (IsRightHandThePointer(gesture,skeleton))
        return new Line(skeleton.Joints[JointType.ElbowRight].Position, skeleton.Joints[JointType.HandRight].Position);
    else
        return new Line(skeleton.Joints[JointType.ElbowLeft].Position, skeleton.Joints[JointType.HandLeft].Position);
}

Finding the Coordinates of an Object

The Kinect API gives you X, Y and Z coordinates of 20 joints on the human body. I use the coordinates of the elbow and hand to create a line directed towards an object. The program then asks you to point again from a different position and another line is created directed towards the object. The lines are skew lines because they are not parallel and since its 3 dimensional space they also do not intersect. The lines get close to each other at the coordinates of the object you were pointing to. I use the Vector3D library to do some 3D math calculations including the dot product and cross product to get the midpoint of the line segment that is simultaneously perpendicular to both lines. I first learned how to solve 3 dimensional math problems in my third year calculus class in college but that was 20 years ago and I had a blast relearning it!

public Point3D Intersection(Line secondLine)
        {
            Vector3D vectorPerpendicularBothLinesFromLine1ToLine2 = Vector3D.CrossProduct
            (secondLine.Vector, Vector3D.CrossProduct(this.Vector, secondLine.Vector));
 
            //The skew lines are Parallel so return MaxValue for the coordinates
            if (vectorPerpendicularBothLinesFromLine1ToLine2 == new Vector3D(0, 0, 0))
            {
                return new Point3D(double.MaxValue, double.MaxValue, double.MaxValue);
            }
 
            Vector3D vectorQP = new Vector3D(secondLine.Point.X - this.Point.X, 
            secondLine.Point.Y - this.Point.Y, secondLine.Point.Z - this.Point.Z);
            double t1 = Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine1ToLine2, 
            vectorQP) / Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine1ToLine2, this.Vector);
            Point3D firstPoint = this.Position(t1);
 
            Vector3D vectorPerpendicularBothLinesFromLine2ToLine1 = Vector3D.CrossProduct
            (this.Vector, Vector3D.CrossProduct(this.Vector, secondLine.Vector));
            Vector3D vectorPQ = new Vector3D(this.Point.X - secondLine.Point.X, 
            this.Point.Y - secondLine.Point.Y, this.Point.Z - secondLine.Point.Z);
            double t2 = Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine2ToLine1, 
            vectorPQ) / Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine2ToLine1, secondLine.Vector);
            Point3D secondPoint = secondLine.Position(t2);
 
            double midX = (firstPoint.X + secondPoint.X) /2;
            double midY = (firstPoint.Y + secondPoint.Y) /2;
            double midZ = (firstPoint.Z + secondPoint.Z) /2;
            Point3D midPoint = new Point3D(midX,midY,midZ);
            return midPoint;
        }

Figuring Out Which Object You Are Pointing At

Once you know the coordinates of the objects in 3D space, then angles can be computed to figure out which object you are point at. The vertex of the angles is your elbow. The vector from your elbow to your hand is used as a reference and it indicates the direction you are pointing at. The vector from your elbow to the coordinates of each object is calculated. Whichever angle is the smallest between the reference vector with your hand and the vector to the object indicates the object that you are pointing at.

The code below shows the algorithm for calculating the angle to an object. The geometric definition of the dot product is used to compute the angle between vectors.

The equation is solved for the angle and translated into code below. The angle is converted from radians to degrees only because I like thinking in degrees better than radians. The result is also rounded to remove decimal approximation errors so that I would get an expected value in a unit test instead of something really close with a round off error.

public double AngleToPoint(SkeletonPoint point)
{
    Vector3D vectorToPoint = new Vector3D(point.X -
    this.Point.X, point.Y - this.Point.Y, point.Z - this.Point.Z);
    double cosOfAngle = Vector3D.DotProduct(vectorToPoint,
    this.Vector) / (this.Vector.Length * vectorToPoint.Length);
    double angleInDegrees = Math.Round( Math.Acos(cosOfAngle) * 180 / Math.PI,3);
    return angleInDegrees;
}

Whichever angle is the smallest by absolute value is the device that you are pointing to.

public KinectDevice PointingToDevice(Skeleton skeleton, Gesture gesture)
{
    Line line = GetLinePointingToDevice(skeleton, gesture);
    KinectDevice pointingToDevice = null;
    double shortestAngle=180;

    foreach (KinectDevice kinectDevice in _devices)
    {
        double angleToPoint = Math.Abs(line.AngleToPoint(kinectDevice.Point));

        if (angleToPoint <= shortestAngle || pointingToDevice == null)
        {
            shortestAngle = angleToPoint;
            pointingToDevice = kinectDevice;
        }
    }

    return pointingToDevice;
}

The following method turns on or off the device that you are pointing to based on the gesture detected from the other hand.

private void UpdateDeviceBasedOnGesture(Skeleton skeleton, Gesture gesture)
        {
 
            KinectDevice closestDevice = KinectLiving.GetInstance().
            	ClosestDevicePointedAt(skeleton, gesture);
            textBlockMessage.Text = closestDevice.Name;
            textBlockLearnDevicePointMessage.Text = "";
 
            switch (gesture)
            {
                case Gesture.TurnOnWithLeftHandPointingToDevice:
                case Gesture.TurnOnWithRightHandPointingToDevice:
                    {
                        closestDevice.TurnOn();
                        break;
                    }
 
                case Gesture.TurnOffWithLeftHandPointingToDevice:
                case Gesture.TurnOffWithRightHandPointingToDevice:
                    {
                        closestDevice.TurnOff();
                        break;
                    }
            }
        }

The KinectDevice TurnOn and TurnOff methods call web service methods in the LogicalLiving.Web project. The LogicalLiving.Web project is an MVC4 project that has a user interface built upon jQuery Mobile. All devices can be controlled from the web intended to be run as a phone application but can also be run in any browser. Please read my CodeProject jQuery Mobile article for more information. The KinectDevice code leverages the same MVC controller methods that the jQuery Mobile site uses to turn devices on or off.

internal void TurnOn()
        {
            if (this.NetduinoMessageOn.Length!=0)
            {
                InvokeMvcControllerMethod.SendMessageToNetduino(this.NetduinoMessageOn, this.VoiceOn);
            }
 
            if (this.DeviceNode != DeviceNode.None)
            {
                InvokeMvcControllerMethod.SendZWaveMessage(this.DeviceNode, DeviceState.On);
            }
        }

Kinect Gestures

I wrote a class called KinectGestures which is used to detect the gestures for the application. The gestures detected are one of the following: TurnOnWithRightHandPointingToDevice, TurnOnWithLeftHandPointingToDevice, TurnOffWithRightHandPointingToDevice, TurnOffWithLeftHandPointingToDevice, None.

It is important to know which hand is doing the gesture because the other hand is pointing to the device. The code keeps track of the state of the left and right hand as defined by the enumerations below:

namespace LogicalLiving.KinectLiving
{
    public enum Gesture { TurnOnWithRightHandPointingToDevice, 
    TurnOnWithLeftHandPointingToDevice, TurnOffWithRightHandPointingToDevice, 
    TurnOffWithLeftHandPointingToDevice, None }
    public enum RightHandState { AboveHeadRight, AboveHeadSweepRightToLeft, 
    AboveHeadLeft, AboveShoulderSweepLeftToRight, BelowHead, Reset };
    public enum LeftHandState { AboveHeadLeft, AboveHeadSweepLeftToRight, 
    AboveHeadRight, AboveHeadSweepRightToLeft, BelowHead, Reset };
}

The gesture to turn on a device is to raise one of your hands above your head and swipe it towards your head. You swipe it in the other direction to turn off the device. The other hand not doing the gesture is pointing to the device that you want to control.

The right hand state is calculated by:

public static RightHandState GetRightHandState(Skeleton skeleton)
        {
            RightHandState rightHandState = RightHandState.Reset;
 
            if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
            {
                rightHandState = RightHandState.Reset;
            }
            else if (skeleton.Joints[JointType.HandRight].Position.Y 
            < skeleton.Joints[JointType.Head].Position.Y)
            {
                rightHandState= RightHandState.BelowHead;
            }
            else if (skeleton.Joints[JointType.HandRight].Position.X 
            >= skeleton.Joints[JointType.ShoulderRight].Position.X)
            {
                rightHandState = _previousRightHandState == RightHandState.AboveHeadLeft || 
                _previousRightHandState == RightHandState.AboveShoulderSweepLeftToRight ? 
                RightHandState.AboveShoulderSweepLeftToRight : RightHandState.AboveHeadRight;
            }
            else if (skeleton.Joints[JointType.HandRight].Position.X 
            < skeleton.Joints[JointType.ShoulderRight].Position.X)
            {
                rightHandState = _previousRightHandState == RightHandState.AboveHeadRight || 
                _previousRightHandState == RightHandState.AboveHeadSweepRightToLeft ? 
                RightHandState.AboveHeadSweepRightToLeft : RightHandState.AboveHeadLeft;
            }
 
            _previousRightHandState = rightHandState;
            return rightHandState;
        }

There is a similar method to calculate the left hand state. The gesture detected is then calculated as a state machine looking at both the right hand state and the left hand state to detect the gesture.

public static Gesture DetectGesture(Skeleton skeleton)
        {
            Gesture gesture = Gesture.None;
 
            RightHandState rightHandState = GetRightHandState(skeleton);
            LeftHandState leftHandState = GetLeftHandState(skeleton);
 
            if (rightHandState == RightHandState.AboveHeadSweepRightToLeft)
            {
                gesture = Gesture.TurnOnWithLeftHandPointingToDevice;
            }
            else if (leftHandState == LeftHandState.AboveHeadSweepLeftToRight)
            {
                gesture = Gesture.TurnOnWithRightHandPointingToDevice;
            }
            else if (rightHandState == RightHandState.AboveShoulderSweepLeftToRight)
            {
                gesture = Gesture.TurnOffWithLeftHandPointingToDevice;
            }
            else if (leftHandState == LeftHandState.AboveHeadSweepRightToLeft)
            {
                gesture = Gesture.TurnOffWithRightHandPointingToDevice;
            }
 
            if (gesture != Gesture.None)
            {
                _previousRightHandState = RightHandState.Reset;
                _previousLeftHandState = LeftHandState.Reset;
            }
 
            return gesture;
        }

Speech Recognition

The speech recognition class that I wrote is based on sample code in the Kinect for Windows Resources & Samples in the Speech Basics – WPF sample. The Initialize method calls the BuildAllChoices() method to return all of the voice command choices.

private static Choices BuildAllChoices()
        {
            Choices voiceCommandChoices = new Choices();
 
            foreach (KinectDevice kinectDevice in KinectLiving.GetInstance().Devices)
            {
                string[] voiceOn = kinectDevice.VoiceOn.Split('|');
                string[] voiceOff = kinectDevice.VoiceOff.Split('|');
 
                List<string> voiceCommandList = new List<string>();
                voiceCommandList.AddRange(voiceOn);
                voiceCommandList.AddRange(voiceOff);
 
                foreach (string ignoreCommand in voiceCommandList)
                {
                    //Commands to ignore - you must say Alice first
                    voiceCommandChoices.Add(new SemanticResultValue(ignoreCommand, ignoreCommand));
 
                    //Commands 
                    string validCommand = string.Format("Alice {0}", ignoreCommand);
                    voiceCommandChoices.Add(new SemanticResultValue(validCommand, validCommand));
                }
            }
            return voiceCommandChoices;
        }

The voiceCommandList is created with both:

all of the voice commands to turn on a device
all of the voice commands to turn off a device

string[] voiceOn = kinectDevice.VoiceOn.Split('|');
string[] voiceOff = kinectDevice.VoiceOff.Split('|');
 
List<string> voiceCommandList = new List<string>();
voiceCommandList.AddRange(voiceOn);
voiceCommandList.AddRange(voiceOff);

I chose to name the computer presence in the house Alice after the housekeeper in the Brady Bunch TV show. Our home life is similar to the Brady Bunch because I have 2 boys from a previous marriage and my wife has 2 girls. We want the speech recognition system to ignore commands that do not start with “Alice”. For each item in the voiceCommandList, I add a choice that is invalid without the “Alice” prefix and one that is valid that includes the “Alice” prefix. It is important to load the Choices with the invalid commands without the “Alice” prefix because it keeps the speech recognition engine from selecting a command that you did not intend from normal speech when you do not first say “Alice”.

The SpeechRecognized method is the handler for recognized speech events and is shown in the code below. The SpeechRecognizedEventArgs includes a confidence value for how certain the Kinect is that it detected the correct choice. The value is between 0 and 1 where the greater the value the more confident the Kinect is that it detected the correct choice. The code below ignores choices that do not start with “Alice” and it also ignores results when the Confidence is less than the MinSpeechConfidence of the device. Each device has its own confidence level because you want to be sure on some devices such as the fireplace before changing its state but do not care as much on the light if you erroneously change its state. Lower confidence levels make it easier to control a device without making the user repeat commands because of background noise or because they did not speak it clear enough the first time.

private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    string phrase = e.Result.Semantics.Value.ToString();

    if (phrase.StartsWith("Alice "))
    {
        phrase = phrase.Replace("Alice ", "");
        KinectDevice kinectDeviceOn = KinectLiving.GetInstance().
        Devices.FirstOrDefault(d => d.VoiceOn.Split('|').Contains(phrase));

        if (kinectDeviceOn != null &&
        e.Result.Confidence >= kinectDeviceOn.MinSpeechConfidence)
            kinectDeviceOn.TurnOn();

        KinectDevice kinectDeviceOff = KinectLiving.GetInstance().
        Devices.FirstOrDefault(d => d.VoiceOff.Split('|').Contains(phrase));

        if (kinectDeviceOff != null &&
        e.Result.Confidence >= kinectDeviceOff.MinSpeechConfidence)
            kinectDeviceOff.TurnOff();
    }
}

The line of code below returns the KinectDevice that contains the phrase in the pipe delineated list of choices for the VoiceOn property. If none of the devices have a matching phrase, then it returns a null.

KinectDevice kinectDeviceOn = KinectLiving.GetInstance().
Devices.FirstOrDefault(d => d.VoiceOn.Split('|').Contains(phrase));

If a match is found, then the TurnOn method is called. This is the same method that the gesture detection routines uses to turn on a device. There is matching code for the VoiceOff phrases to turn devices off. The KinectDevice TurnOn and TurnOff methods calls web service methods in the LogicalLiving.Web project.

Z-Wave

Z-Wave is a wireless communications protocol designed for home automation. There are tons of commercially available Z-Wave devices. The LogicalLiving.ZWave project contains a Class Library to control Z-Wave devices through the Areon Z-Stick Z-Wave USB Adapter. I purchased the USB adapter online for less than fifty dollars and all of my Z-Wave devices were also each under fifty dollars. I installed the Z-Wave devices by turning off the power in the house and then replaced the standard light switches and power outlets with Z-Wave devices. I wrote the LogicalLiving.Zwave.DesktopMessenger project to be a sample Windows Forms UI to control the LogicalLiving.ZWave Class Library. It is useful to use the LogicalLiving.Zwave.DesktopMessenger to figure out the values for the Z-Wave DeviceNode. Each Z-Wave device has its own unique DeviceNode which is required in the Z-Wave message to change its state.

Netduino

Netduino is a wonderful open-source electronics prototyping platform based on the .NET Micro Framework. I use the netduino plus 2 and a custom circuit that I built to control many devices in my home including turning on my fireplace, aiming a squirt gun at the pool, watering the garden and opening the garage door. I use the Kinect.Living gestures and audio commands for turning on the fireplace. We have a new kitten in the house who is very interested in the fireplace. I quickly became concerned that the kitten would crawl into the fireplace at the wrong time while someone was doing the gesture or audio command to turn it on. For safety, I wired up a mesh screen curtain that she cannot get behind! Please read my previous articles on the netduino and jQuery Mobile:

Article 1 - Home Automation with Netduino and Kinect
Article 2 - Using jQuery Mobile with MVC and Netduino for Home Automation
Article 3 - IoT for Home Automation

Kinect for Windows v2

This article was originally written for v1 of the Kinect for Windows. I have upgraded the code for the Kinect for Windows v2 in my IoT for Home Automation article.

Summary

It is really fun to use the Microsoft Kinect for Windows API for home automation projects. This project presents a much more natural UI for controlling your devices in your house. It is really nice to be able to control devices without needing a remote control. In our house, the remote control is always lost in the sofa somewhere, but no worries anymore. With the Microsoft Kinect and this project, you are the remote control to control the devices in the entire house.

The ideas in this article can reach far beyond home automation. There are many other useful applications for having a computer know what object you are pointing to. We are living in an exciting time where vision systems are packaged up into inexpensive devices and are readily available such as the Microsoft Kinect. The Kinect and the Kinect for Windows SDK enables us to build incredible applications with minimal effort that would seem like science fiction 10 years ago.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)