Introduction
Many specialists are predicting that in the closer future, a new revolution in information technologies will occur. This revolution
will be connected with new computer abilities to segment, track, and understand pose, gestures, and emotional expressions of humans.
For this, computers must begin to use new types of video sensors that will provide 3D-videos. The Kinect sensor is the first of such new types of sensors.
The Kinect sensor has two cameras: a traditional color video camera and an infrared light sensor that measures depth, position, and motion.
The Kinnect sensor started as a sensor for the XBox 360 game system about an year ago, but almost immediately many software developers began
to try to use it for recognition of human poses and gestures. More information about can be read from www.kinecthacks.com.
My article is devoted to research of sitting posture recognition. Sitting posture recognition is based on human skeleton tracking. There are three
software packages that may produce human skeleton tracking with the Kinect sensor: OpenNi/PrimeSense Nite library, Micosoft Kinnect Research SDK, and
the Libfreenet library. I have used the first two. On their basis, I developed C# WPF applications where I combined color video streams and skeleton images.
These applications run under Microsoft Windows 7 and .NET Framework 4.0. For their compilation, you need Microsoft Visual Studio 2010. You may find instructions
to install the OpenNi/PrimeSense Nite library and the Microsoft Kinect Research SDK at www.kinecthacks.com.
Background
Sitting posture recognition algorithm is based on human skeleton tracking and obtaining three coordinates (xs, ys, zs),
(xh, yh, zh), and (xk, yk, zk) of the positions of the human Shoulder (denoted as S),
Hip (denoted as H), and Knee (denoted as K).
A sitting posture is related to the angle a between the line HK (from hip to knee) and the line HS (from hip to shoulder).
We will distinguish the left body part angle a - angle between “center hip to left knee” vector and “center hip to center shoulder”
vector, and right body part angle a - angle between “center hip to left knee” vector and “center hip to center shoulder” vector.
From angle a and the hand’s position, the human sitting posture can be concluded and classified as one of four specified
types - sleeping, concentrating, raising hand, and non-focusing, as given in the table below.
Angle, a | Hand posture | Sitting posture |
0 ~ 40 | down | sleeping |
40~80 | down | non-concentrating |
80~100 | down | concentrating |
up | raising hand |
100~180 | down | non-concentrating |
Using the Code
I had two problems combining a color video stream and a skeleton image.
The first problem was how to locate them simply in one control in a window. For this problem, I used a simple WPF form for both applications
that contain a StatusBar
control and a Grid
panel. The Grid
panel contains an Image
and Canvas
control with the same size.
<Window x:Class="RecognitionPose.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="User tracking with Microsoft SDK" Height="600"
Width="862" Loaded="Window_Loaded"
DataContext="{Binding}">
<DockPanel LastChildFill="True">
<StatusBar Name="statusBar"
MinHeight="40" DockPanel.Dock="Bottom">
<StatusBarItem>
<TextBlock Name="textBlock"
Background="LemonChiffon"
FontSize='10'> Ready </TextBlock>
</StatusBarItem>
</StatusBar>
<Grid DockPanel.Dock="Top">
<Image Name="imgCamera" Width="820"
ClipToBounds="True" Margin="10,0" />
<Canvas Width="820" Height="510"
Name="skeleton" ClipToBounds="True"/>
</Grid>
</DockPanel>
</Window>
The second problem of working with both OpenNI/PrimeSense Nite and the Microsoft SDK is that the events of refreshing video frames and skeleton frames occur non-synchronously.
For this problem, for the Microsoft SDK case, I call the main method RecognizePose
of my Recognition
class
in the SkeletonFrameReady
event handler after imgCamera
and the skeleton controls are refreshed. The SkeletonFrameRead
event handler
simply synchronizes with the VideoFrameReady
event handler by copying the current video frame in the planar image temp variable:
planarImage = ImageFrame.Image;
and then copying this temp variable to imgCamera.Source
in the SkeletonFrameReady
event handler:
imgCamera.Source = BitmapSource.Create(planarImage.Width, planarImage.Height,
194,194,PixelFormats.Bgr32, null, planarImage.Bits,
planarImage.Width * planarImage.BytesPerPixel);
For the OpenNi/PrimeSense Nite case, I use the NuiVision library
http://www.codeproject.com/Articles/169161/Kinect-and-WPF-Complete-body-tracking
written by Vangos Pterneas
for synchronization of the video frame and skeleton recognition events. I call the RecognizePose
method in the UsersUpdated
event handler of this library.
For sitting posture recognition, the main problem was to find for the human, the distance and angle relative to the Kinect sensor where recognition is stable.
For this purpose, I added five parameters in the application settings to control the algorithm behavior:
isDebug
-if true then show information about the current human location on the status bar;confidenceAngle
- control differences between the left part body and right part body angles a; if this angle is more for the given level,
we assume that the recognition isn't stable.standPoseFactor
- control differences between the sitting and standing pose; if the current human height multiplication on this factor is more than the initial
human height in standing pose, we assume that the current pose is standing pose too.isAutomaticChoiceAngle
- choice between automatic definition angle a as nearest to camera (true) and calculation
angle a as average (false) between the left part body and right part body angles a;shiftAngle
- shift angle subtracted from angle a to delete skeleton recognitions error.
I found that the most stable sitting recognition occurs when these parameters have these values:
confidenceAngle=50 degree;
standPoseFactor=1.1;
isAutomaticChoiceAngle=true;
shiftAngle=20.
The Kinect sensor is located on the floor, the distance between the Kinect sensor and sitting human is about 2 meters, and
the human body in turned on a 45-degree angle relative to the sensor.
Advantage of a sitting human location is that the Kinect sensor may constantly track the parts of the human body that are necessary for recognition:
- two knees;
- one hip;
- two shoulders;
- two hands;
- head
For other human locations, this isn't so. For example, for frontal location, the sensor really doesn't track the hip; for profile location, the sensor tracks
only one part of the body: right or left.
Points of Interest
I made two movies about using these two applications:
From the movies, we can conclude that recognition works well for both software packages. However, the applications may be improved
significantly by extending the sitting human location zone where the recognition is stable. For this, we must use not one but two or more Kinect sensors.
I think that these applications may be used in any area where it is necessary to control human behavior in sitting pose. For cases when human state becomes non-concentrating
or sleeping, the applications may be enhanced by adding some feedback that will send an alarm, alert, or emergency signal. On the other hand, this application may be used
in universities to collect statistics about student activity during seminars and labs. These applications will calculate the average time a student is concentrating
or not-concentrating during a seminar, the number of times they are raising hands, and the professor can account this statistics in personal works with the student.