Introduction
I love using the Microsoft Kinect for my home automation projects. The Microsoft Kinect for Windows API is really amazing and can be used to track our movement in the physical world in unique and creative ways outside of the traditional game controller. Traditional computer vision systems are too slow to track normal human motion, but the Kinect is able to give you coordinates of 20 joints 30 times a second. The Kinect is able to simplify the computer vision problem by creating what is called a Point Cloud out of infrared light. This infrared light is similar to visible light but has a longer wavelength than what we can see. The Point Cloud is able to be seen with a special camera or night vision goggles as shown in the image below.
The Kinect has a special lens that sends out a known pattern of spaced lines of infrared light. The light makes dots on the objects it touches creating a Point Cloud. The Kinect has a special camera for seeing the infrared dots. The vision system on the Kinect measures the distance between the dots and analyses the displacement in the pattern to know how far away an object is. See the image below to see how close up objects have dots closer together and further objects have dots spaced further apart. The Kinect is able to analyze the spacing of the infrared dots to build a depth map and quickly see a human outline because the human is in front of other objects.
Create a Natural UI with the Kinect
There are some great user interfaces with the Kinect but most require you to be looking at a computer screen. I built a system that does not require you to look at a computer in order to select a device and turn it on or off. You can simply point to a device with one hand and raise your other hand above your head and wave one direction to turn on and the other direction to turn off. In addition to using gestures, I use the Kinect speech recognition engine to turn devices on or off.
Click here to see it all work!
Vectors
Vectors are a cornerstone in mathematics and physics and they represent a direction and a magnitude (also called a length). They are fundamental to 3D programming and are used extensively in building 3D models for computer games or engineering applications.
Vector3D
is a structure in the System.Windows.Media.Media3D
namespace. This namespace contains types and structures that support 3D presentation in Windows Presentation Foundation (WPF) applications. The Vector3D
structure was built for WPF applications but is also very useful in helping to process other 3D vector data including vector data from the Kinect. The Microsoft.Kinect
namespace has a Vector4
structure. This is similar to the Vector3D
structure from WPF but includes a property called W in addition to the standard X, Y and Z properties of Vector3D
. The additional W property is the fourth dimension in Vector4
and is used for rotations in 3D space around the axis defined by the vector in a number system called Quaternions. I use the Vector3D
from WPF library over the Vector4
from the Kinect library because for this project I have no need to rotate anything in 3D space and the Vector3D
has useful methods built in for calculating the dot product and cross product and also has a length property.
I further extend the Vector3D
functionality with my Line
class. In mathematics, a line can be represented by a point and a vector that passes through the point. It takes 2 points to define a line and so the constructor has arguments for 2 SkeletonPoints
from the Kinect.
public Line(SkeletonPoint Point1, SkeletonPoint Point2)
{
_point.X = Point1.X;
_point.Y = Point1.Y;
_point.Z = Point1.Z;
_vector.X = Point2.X - Point1.X;
_vector.Y = Point2.Y - Point1.Y;
_vector.Z = Point2.Z - Point1.Z;
}
The method below is in the KinectLiving
class. It determines which hand is pointing to the device based on a gesture from the other hand and uses the position of the elbow and hand to return a line that the user is pointing to the device.
internal static Line GetLinePointingToDevice(Skeleton skeleton, Gesture gesture)
{
if (IsRightHandThePointer(gesture,skeleton))
return new Line(skeleton.Joints[JointType.ElbowRight].Position, skeleton.Joints[JointType.HandRight].Position);
else
return new Line(skeleton.Joints[JointType.ElbowLeft].Position, skeleton.Joints[JointType.HandLeft].Position);
}
Finding the Coordinates of an Object
The Kinect API gives you X, Y and Z coordinates of 20 joints on the human body. I use the coordinates of the elbow and hand to create a line directed towards an object. The program then asks you to point again from a different position and another line is created directed towards the object. The lines are skew lines because they are not parallel and since its 3 dimensional space they also do not intersect. The lines get close to each other at the coordinates of the object you were pointing to. I use the Vector3D
library to do some 3D math calculations including the dot product and cross product to get the midpoint of the line segment that is simultaneously perpendicular to both lines. I first learned how to solve 3 dimensional math problems in my third year calculus class in college but that was 20 years ago and I had a blast relearning it!
public Point3D Intersection(Line secondLine)
{
Vector3D vectorPerpendicularBothLinesFromLine1ToLine2 = Vector3D.CrossProduct
(secondLine.Vector, Vector3D.CrossProduct(this.Vector, secondLine.Vector));
if (vectorPerpendicularBothLinesFromLine1ToLine2 == new Vector3D(0, 0, 0))
{
return new Point3D(double.MaxValue, double.MaxValue, double.MaxValue);
}
Vector3D vectorQP = new Vector3D(secondLine.Point.X - this.Point.X,
secondLine.Point.Y - this.Point.Y, secondLine.Point.Z - this.Point.Z);
double t1 = Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine1ToLine2,
vectorQP) / Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine1ToLine2, this.Vector);
Point3D firstPoint = this.Position(t1);
Vector3D vectorPerpendicularBothLinesFromLine2ToLine1 = Vector3D.CrossProduct
(this.Vector, Vector3D.CrossProduct(this.Vector, secondLine.Vector));
Vector3D vectorPQ = new Vector3D(this.Point.X - secondLine.Point.X,
this.Point.Y - secondLine.Point.Y, this.Point.Z - secondLine.Point.Z);
double t2 = Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine2ToLine1,
vectorPQ) / Vector3D.DotProduct(vectorPerpendicularBothLinesFromLine2ToLine1, secondLine.Vector);
Point3D secondPoint = secondLine.Position(t2);
double midX = (firstPoint.X + secondPoint.X) /2;
double midY = (firstPoint.Y + secondPoint.Y) /2;
double midZ = (firstPoint.Z + secondPoint.Z) /2;
Point3D midPoint = new Point3D(midX,midY,midZ);
return midPoint;
}
Figuring Out Which Object You Are Pointing At
Once you know the coordinates of the objects in 3D space, then angles can be computed to figure out which object you are point at. The vertex of the angles is your elbow. The vector from your elbow to your hand is used as a reference and it indicates the direction you are pointing at. The vector from your elbow to the coordinates of each object is calculated. Whichever angle is the smallest between the reference vector with your hand and the vector to the object indicates the object that you are pointing at.
The code below shows the algorithm for calculating the angle to an object. The geometric definition of the dot product is used to compute the angle between vectors.
The equation is solved for the angle and translated into code below. The angle is converted from radians to degrees only because I like thinking in degrees better than radians. The result is also rounded to remove decimal approximation errors so that I would get an expected value in a unit test instead of something really close with a round off error.
public double AngleToPoint(SkeletonPoint point)
{
Vector3D vectorToPoint = new Vector3D(point.X -
this.Point.X, point.Y - this.Point.Y, point.Z - this.Point.Z);
double cosOfAngle = Vector3D.DotProduct(vectorToPoint,
this.Vector) / (this.Vector.Length * vectorToPoint.Length);
double angleInDegrees = Math.Round( Math.Acos(cosOfAngle) * 180 / Math.PI,3);
return angleInDegrees;
}
Whichever angle is the smallest by absolute value is the device that you are pointing to.
public KinectDevice PointingToDevice(Skeleton skeleton, Gesture gesture)
{
Line line = GetLinePointingToDevice(skeleton, gesture);
KinectDevice pointingToDevice = null;
double shortestAngle=180;
foreach (KinectDevice kinectDevice in _devices)
{
double angleToPoint = Math.Abs(line.AngleToPoint(kinectDevice.Point));
if (angleToPoint <= shortestAngle || pointingToDevice == null)
{
shortestAngle = angleToPoint;
pointingToDevice = kinectDevice;
}
}
return pointingToDevice;
}
The following method turns on or off the device that you are pointing to based on the gesture detected from the other hand.
private void UpdateDeviceBasedOnGesture(Skeleton skeleton, Gesture gesture)
{
KinectDevice closestDevice = KinectLiving.GetInstance().
ClosestDevicePointedAt(skeleton, gesture);
textBlockMessage.Text = closestDevice.Name;
textBlockLearnDevicePointMessage.Text = "";
switch (gesture)
{
case Gesture.TurnOnWithLeftHandPointingToDevice:
case Gesture.TurnOnWithRightHandPointingToDevice:
{
closestDevice.TurnOn();
break;
}
case Gesture.TurnOffWithLeftHandPointingToDevice:
case Gesture.TurnOffWithRightHandPointingToDevice:
{
closestDevice.TurnOff();
break;
}
}
}
The KinectDevice TurnOn
and TurnOff
methods call web service methods in the LogicalLiving.Web
project. The LogicalLiving.Web
project is an MVC4 project that has a user interface built upon jQuery Mobile. All devices can be controlled from the web intended to be run as a phone application but can also be run in any browser. Please read my CodeProject jQuery Mobile article for more information. The KinectDevice
code leverages the same MVC controller methods that the jQuery Mobile site uses to turn devices on or off.
internal void TurnOn()
{
if (this.NetduinoMessageOn.Length!=0)
{
InvokeMvcControllerMethod.SendMessageToNetduino(this.NetduinoMessageOn, this.VoiceOn);
}
if (this.DeviceNode != DeviceNode.None)
{
InvokeMvcControllerMethod.SendZWaveMessage(this.DeviceNode, DeviceState.On);
}
}
Kinect Gestures
I wrote a class called KinectGestures
which is used to detect the gestures for the application. The gestures detected are one of the following: TurnOnWithRightHandPointingToDevice
, TurnOnWithLeftHandPointingToDevice
, TurnOffWithRightHandPointingToDevice
, TurnOffWithLeftHandPointingToDevice
, None
.
It is important to know which hand is doing the gesture because the other hand is pointing to the device. The code keeps track of the state of the left and right hand as defined by the enumerations below:
namespace LogicalLiving.KinectLiving
{
public enum Gesture { TurnOnWithRightHandPointingToDevice,
TurnOnWithLeftHandPointingToDevice, TurnOffWithRightHandPointingToDevice,
TurnOffWithLeftHandPointingToDevice, None }
public enum RightHandState { AboveHeadRight, AboveHeadSweepRightToLeft,
AboveHeadLeft, AboveShoulderSweepLeftToRight, BelowHead, Reset };
public enum LeftHandState { AboveHeadLeft, AboveHeadSweepLeftToRight,
AboveHeadRight, AboveHeadSweepRightToLeft, BelowHead, Reset };
}
The gesture to turn on a device is to raise one of your hands above your head and swipe it towards your head. You swipe it in the other direction to turn off the device. The other hand not doing the gesture is pointing to the device that you want to control.
The right hand state is calculated by:
public static RightHandState GetRightHandState(Skeleton skeleton)
{
RightHandState rightHandState = RightHandState.Reset;
if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
{
rightHandState = RightHandState.Reset;
}
else if (skeleton.Joints[JointType.HandRight].Position.Y
< skeleton.Joints[JointType.Head].Position.Y)
{
rightHandState= RightHandState.BelowHead;
}
else if (skeleton.Joints[JointType.HandRight].Position.X
>= skeleton.Joints[JointType.ShoulderRight].Position.X)
{
rightHandState = _previousRightHandState == RightHandState.AboveHeadLeft ||
_previousRightHandState == RightHandState.AboveShoulderSweepLeftToRight ?
RightHandState.AboveShoulderSweepLeftToRight : RightHandState.AboveHeadRight;
}
else if (skeleton.Joints[JointType.HandRight].Position.X
< skeleton.Joints[JointType.ShoulderRight].Position.X)
{
rightHandState = _previousRightHandState == RightHandState.AboveHeadRight ||
_previousRightHandState == RightHandState.AboveHeadSweepRightToLeft ?
RightHandState.AboveHeadSweepRightToLeft : RightHandState.AboveHeadLeft;
}
_previousRightHandState = rightHandState;
return rightHandState;
}
There is a similar method to calculate the left hand state. The gesture detected is then calculated as a state machine looking at both the right hand state and the left hand state to detect the gesture.
public static Gesture DetectGesture(Skeleton skeleton)
{
Gesture gesture = Gesture.None;
RightHandState rightHandState = GetRightHandState(skeleton);
LeftHandState leftHandState = GetLeftHandState(skeleton);
if (rightHandState == RightHandState.AboveHeadSweepRightToLeft)
{
gesture = Gesture.TurnOnWithLeftHandPointingToDevice;
}
else if (leftHandState == LeftHandState.AboveHeadSweepLeftToRight)
{
gesture = Gesture.TurnOnWithRightHandPointingToDevice;
}
else if (rightHandState == RightHandState.AboveShoulderSweepLeftToRight)
{
gesture = Gesture.TurnOffWithLeftHandPointingToDevice;
}
else if (leftHandState == LeftHandState.AboveHeadSweepRightToLeft)
{
gesture = Gesture.TurnOffWithRightHandPointingToDevice;
}
if (gesture != Gesture.None)
{
_previousRightHandState = RightHandState.Reset;
_previousLeftHandState = LeftHandState.Reset;
}
return gesture;
}
Speech Recognition
The speech recognition class that I wrote is based on sample code in the Kinect for Windows Resources & Samples in the Speech Basics – WPF sample. The Initialize
method calls the BuildAllChoices()
method to return all of the voice command choices.
private static Choices BuildAllChoices()
{
Choices voiceCommandChoices = new Choices();
foreach (KinectDevice kinectDevice in KinectLiving.GetInstance().Devices)
{
string[] voiceOn = kinectDevice.VoiceOn.Split('|');
string[] voiceOff = kinectDevice.VoiceOff.Split('|');
List<string> voiceCommandList = new List<string>();
voiceCommandList.AddRange(voiceOn);
voiceCommandList.AddRange(voiceOff);
foreach (string ignoreCommand in voiceCommandList)
{
voiceCommandChoices.Add(new SemanticResultValue(ignoreCommand, ignoreCommand));
string validCommand = string.Format("Alice {0}", ignoreCommand);
voiceCommandChoices.Add(new SemanticResultValue(validCommand, validCommand));
}
}
return voiceCommandChoices;
}
Each device can have an unlimited number of commands to turn it on or off. For example, any of the following commands can be used to turn the living room light on: light|light on|main light|main light on|living room|living room light|living room on|living room light on.
The voiceCommandList
is created with both:
- all of the voice commands to turn on a device
- all of the voice commands to turn off a device
string[] voiceOn = kinectDevice.VoiceOn.Split('|');
string[] voiceOff = kinectDevice.VoiceOff.Split('|');
List<string> voiceCommandList = new List<string>();
voiceCommandList.AddRange(voiceOn);
voiceCommandList.AddRange(voiceOff);
I chose to name the computer presence in the house Alice after the housekeeper in the Brady Bunch TV show. Our home life is similar to the Brady Bunch because I have 2 boys from a previous marriage and my wife has 2 girls. We want the speech recognition system to ignore commands that do not start with “Alice”. For each item in the voiceCommandList
, I add a choice that is invalid without the “Alice” prefix and one that is valid that includes the “Alice” prefix. It is important to load the Choices with the invalid commands without the “Alice” prefix because it keeps the speech recognition engine from selecting a command that you did not intend from normal speech when you do not first say “Alice”.
The SpeechRecognized
method is the handler for recognized speech events and is shown in the code below. The SpeechRecognizedEventArgs
includes a confidence value for how certain the Kinect is that it detected the correct choice. The value is between 0 and 1 where the greater the value the more confident the Kinect is that it detected the correct choice. The code below ignores choices that do not start with “Alice” and it also ignores results when the Confidence
is less than the MinSpeechConfidence
of the device. Each device has its own confidence level because you want to be sure on some devices such as the fireplace before changing its state but do not care as much on the light if you erroneously change its state. Lower confidence levels make it easier to control a device without making the user repeat commands because of background noise or because they did not speak it clear enough the first time.
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string phrase = e.Result.Semantics.Value.ToString();
if (phrase.StartsWith("Alice "))
{
phrase = phrase.Replace("Alice ", "");
KinectDevice kinectDeviceOn = KinectLiving.GetInstance().
Devices.FirstOrDefault(d => d.VoiceOn.Split('|').Contains(phrase));
if (kinectDeviceOn != null &&
e.Result.Confidence >= kinectDeviceOn.MinSpeechConfidence)
kinectDeviceOn.TurnOn();
KinectDevice kinectDeviceOff = KinectLiving.GetInstance().
Devices.FirstOrDefault(d => d.VoiceOff.Split('|').Contains(phrase));
if (kinectDeviceOff != null &&
e.Result.Confidence >= kinectDeviceOff.MinSpeechConfidence)
kinectDeviceOff.TurnOff();
}
}
The line of code below returns the KinectDevice
that contains the phrase in the pipe delineated list of choices for the VoiceOn
property. If none of the devices have a matching phrase, then it returns a null
.
KinectDevice kinectDeviceOn = KinectLiving.GetInstance().
Devices.FirstOrDefault(d => d.VoiceOn.Split('|').Contains(phrase));
If a match is found, then the TurnOn
method is called. This is the same method that the gesture detection routines uses to turn on a device. There is matching code for the VoiceOff
phrases to turn devices off. The KinectDevice TurnOn
and TurnOff
methods calls web service methods in the LogicalLiving.Web
project.
Z-Wave
Z-Wave is a wireless communications protocol designed for home automation. There are tons of commercially available Z-Wave devices. The LogicalLiving.ZWave
project contains a Class Library to control Z-Wave devices through the Areon Z-Stick Z-Wave USB Adapter. I purchased the USB adapter online for less than fifty dollars and all of my Z-Wave devices were also each under fifty dollars. I installed the Z-Wave devices by turning off the power in the house and then replaced the standard light switches and power outlets with Z-Wave devices. I wrote the LogicalLiving.Zwave.DesktopMessenger
project to be a sample Windows Forms UI to control the LogicalLiving.ZWave
Class Library. It is useful to use the LogicalLiving.Zwave.DesktopMessenger
to figure out the values for the Z-Wave DeviceNode
. Each Z-Wave device has its own unique DeviceNode
which is required in the Z-Wave message to change its state.
Netduino
Netduino is a wonderful open-source electronics prototyping platform based on the .NET Micro Framework. I use the netduino plus 2 and a custom circuit that I built to control many devices in my home including turning on my fireplace, aiming a squirt gun at the pool, watering the garden and opening the garage door. I use the Kinect.Living
gestures and audio commands for turning on the fireplace. We have a new kitten in the house who is very interested in the fireplace. I quickly became concerned that the kitten would crawl into the fireplace at the wrong time while someone was doing the gesture or audio command to turn it on. For safety, I wired up a mesh screen curtain that she cannot get behind! Please read my previous articles on the netduino and jQuery Mobile:
Kinect for Windows v2
This article was originally written for v1 of the Kinect for Windows. I have upgraded the code for the Kinect for Windows v2 in my IoT for Home Automation article.
Summary
It is really fun to use the Microsoft Kinect for Windows API for home automation projects. This project presents a much more natural UI for controlling your devices in your house. It is really nice to be able to control devices without needing a remote control. In our house, the remote control is always lost in the sofa somewhere, but no worries anymore. With the Microsoft Kinect and this project, you are the remote control to control the devices in the entire house.
The ideas in this article can reach far beyond home automation. There are many other useful applications for having a computer know what object you are pointing to. We are living in an exciting time where vision systems are packaged up into inexpensive devices and are readily available such as the Microsoft Kinect. The Kinect and the Kinect for Windows SDK enables us to build incredible applications with minimal effort that would seem like science fiction 10 years ago.