Introduction
This is part two of a series documenting my experiments with Kinect for Windows SDK. After the first two or three articles, this series should make a quite good
walkthrough for beginners and a nice reference for more advanced developers.
Part one described the initialization of the Kinect SDK
including parameters for the image capturing engine. Please refer to part one for details about initialization and general background.
Series table of contents:
- Initialization
- ImageStreams
- Coming soon...
Background
The Kinect device has two cameras:
- Video camera - RGB camera for capturing images
- Depth camera - infrared camera used to capture depth data
This article will focus on getting and processing data acquired by cameras.
What are ImageStreams?
ImageStream
is a class provided by Kinect SDK for accessing data captured by Kinect cameras. Each Kinect Runtime
has two streams:
VideoStream
- has to be opened with ImageStreamType.Video
and ImageType.Color
, ImageType.ColorYuv
,
or ImageType.ColorYuvRaw
.
DepthStream
- has to be opened with ImageStreamType.Depth
and ImageType.Depth
or ImageType.DepthAndPlayerIndex
.
As previously described in part one, each stream has to be Open()
after Runtime
initialization. The third parameter required for the stream
is ImageResolution
- 80x60, 320x240, 640x480, and 1280x1024. Please note that DepthStream
has a maximum resolution of 640x480 and different values
of ImageType
support different resolutions.
The usage of ImageStream
is quite simple. You can call the GetNextFrame
method or attach to events exposed by runtime: DepthFrameReady
or VideoFrameReady
.
In case of using events, you will get the frame data through ImageFrameReadyEventArgs.ImageFrame
.
Accessing image data
Using any of the methods mentioned above, you will get an ImageFrame
, which holds the image data itself in an Image
field and some metadata such as:
Type
- contains the type of image (ImageType
) - useful in case you use the same handler for both types of ImageStream
FrameNumber
Timestamp
Resolution
As FrameNumber
and Timestamp
seem to be quite accurate, they present great value if you need to detect lost frames, measure time between frames,
keep sync between video and depth, or otherwise - if you don't need a new image more often than one second.
PlanarImage
The Kinect SDK provides its own class for keeping captured images. It is as simple as it can be - it holds Width
, Height
, BytesPerPixel
,
and raw data in byte[] Bits
.
Video frames hold information in 32-bit XRGB or 16-bit UYVY format.
Depth frames have two different formats depending on choosing the Depth
or DepthAndPlayerIndex
stream type:
- 12-bit depth data (stored in two bytes with upper 4 bits unused)
- 3-bit player index (bits 0-2) and 12-bit depth data (starting at bit 3)
A depth data of value 0 means that objects at this position are either too close or too far.
The PlanarImageHelper
class included in the sources simplifies access to individual pixels:
public class PlanarImageHelper
{
private PlanarImage _img;
public PlanarImage Image { get { return _img; } }
public PlanarImageHelper(PlanarImage src)
{
_img = src;
}
public Byte GetRedAt(int x, int y)
{
return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 2];
}
public Byte GetGreenAt(int x, int y)
{
return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 1];
}
public Byte GetBlueAt(int x, int y)
{
return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 0];
}
public int GetPlayerAt(int x, int y)
{
return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel] & 0x07;
}
public int GetDepthAt(int x, int y, bool hasPlayerData)
{
try
{
int BaseByte = y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel;
if (hasPlayerData)
{
return (_img.Bits[BaseByte + 1] << 5) | (_img.Bits[BaseByte] >> 3);
}
else
{
return (_img.Bits[BaseByte + 1] << 8) | (_img.Bits[BaseByte]);
}
}
catch
{
return 0;
}
}
}
ImageStreamTest
In the attached source code, you will find the ImageStreamTest application. It is a simple illustration of ImageStreams
usage and depth data utilisation.
On the left side of the window, you can choose the effect applied on the image based on depth data:
- None - just the video fames captured
- Depth - each pixel on the video frame is compared with depth data for the same point and replaced with white if it does not fall in the range set by sliders
- Player - all pixels that do not contain player index are replaced with white
- Background - a not very successful try to show background only - pixels without player index are copied to background image and with player index are replaced with remembered background
How to process the images
It depend on your needs. As you can see, in my example, I choose the "iterative" method, because it is very simple to write and very clear to read.
On the other hand, it has very poor performance.
As the depth frame can be treated as a grayscale image, you can achieve the same effects as in my example using filters easily found in all good
image processing libraries - threshold and mask.
First, you have to decide what you really need. If you are building an augmented reality application, then you will need a high quality video and fast
image blending. If you will analyse only part of the image from time to time (face recognition, for example), then you still need hi-res images,
but not high a fps, and this means you can skip processing every frame in the event handler and get frames on demand.
As you can see from the previus sections, the Kinect SDK provides images in very raw format. This means it could be easily converted to anything you need.
Most graphics libraries are able to take this raw array of bytes and create an internal image representation in the most efficient way.
Points of interest
If your needs are mostly image processing with depth map aid, you should stop here and look for some image processing library.
But if you really want to get the most of the Kinect NUI, go to the next big thing - the skeleton tracking engine.
History
- 2012-01-06: Initial submission.