(untagged)

Getting the most of Kinect SDK in C# - Part 2 of ?: ImageStreams

Jarek Kruza

0.00/5 (No votes)

6 Jan 2012

A series about experimenting with Kinect for Windows SDK.

Download source - 18.6 KB

Introduction

This is part two of a series documenting my experiments with Kinect for Windows SDK. After the first two or three articles, this series should make a quite good walkthrough for beginners and a nice reference for more advanced developers.

Part one described the initialization of the Kinect SDK including parameters for the image capturing engine. Please refer to part one for details about initialization and general background.

Series table of contents:

Initialization
ImageStreams
Coming soon...

Background

The Kinect device has two cameras:

Video camera - RGB camera for capturing images
Depth camera - infrared camera used to capture depth data

This article will focus on getting and processing data acquired by cameras.

What are ImageStreams?

ImageStream is a class provided by Kinect SDK for accessing data captured by Kinect cameras. Each Kinect Runtime has two streams:

VideoStream - has to be opened with ImageStreamType.Video and ImageType.Color, ImageType.ColorYuv, or ImageType.ColorYuvRaw.
DepthStream - has to be opened with ImageStreamType.Depth and ImageType.Depth or ImageType.DepthAndPlayerIndex.

As previously described in part one, each stream has to be Open() after Runtime initialization. The third parameter required for the stream is ImageResolution - 80x60, 320x240, 640x480, and 1280x1024. Please note that DepthStream has a maximum resolution of 640x480 and different values of ImageType support different resolutions.

The usage of ImageStream is quite simple. You can call the GetNextFrame method or attach to events exposed by runtime: DepthFrameReady or VideoFrameReady.

In case of using events, you will get the frame data through ImageFrameReadyEventArgs.ImageFrame.

Accessing image data

Using any of the methods mentioned above, you will get an ImageFrame, which holds the image data itself in an Image field and some metadata such as:

Type - contains the type of image (ImageType) - useful in case you use the same handler for both types of ImageStream
FrameNumber
Timestamp
Resolution

As FrameNumber and Timestamp seem to be quite accurate, they present great value if you need to detect lost frames, measure time between frames, keep sync between video and depth, or otherwise - if you don't need a new image more often than one second.

PlanarImage

The Kinect SDK provides its own class for keeping captured images. It is as simple as it can be - it holds Width, Height, BytesPerPixel, and raw data in byte[] Bits.

Video frames hold information in 32-bit XRGB or 16-bit UYVY format.

Depth frames have two different formats depending on choosing the Depth or DepthAndPlayerIndex stream type:

12-bit depth data (stored in two bytes with upper 4 bits unused)
3-bit player index (bits 0-2) and 12-bit depth data (starting at bit 3)

A depth data of value 0 means that objects at this position are either too close or too far.

The PlanarImageHelper class included in the sources simplifies access to individual pixels:

public class PlanarImageHelper
{
    private PlanarImage _img;
    
    public PlanarImage Image { get { return _img; } }

    public PlanarImageHelper(PlanarImage src)
    {
        _img = src;
    }

    public Byte GetRedAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 2];
    }

    public Byte GetGreenAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 1];
    }

    public Byte GetBlueAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel + 0];
    }

    public int GetPlayerAt(int x, int y)
    {
        return _img.Bits[y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel] & 0x07;
    }

    public int GetDepthAt(int x, int y, bool hasPlayerData)
    {
        try
        {
            int BaseByte = y * _img.Width * _img.BytesPerPixel + x * _img.BytesPerPixel;
            if (hasPlayerData)
            {
                return (_img.Bits[BaseByte + 1] << 5) | (_img.Bits[BaseByte] >> 3);
            }
            else
            {
                return (_img.Bits[BaseByte + 1] << 8) | (_img.Bits[BaseByte]);
            }
        }
        catch
        {
            return 0;
        }
    }
}

ImageStreamTest

In the attached source code, you will find the ImageStreamTest application. It is a simple illustration of ImageStreams usage and depth data utilisation.

On the left side of the window, you can choose the effect applied on the image based on depth data:

None - just the video fames captured
Depth - each pixel on the video frame is compared with depth data for the same point and replaced with white if it does not fall in the range set by sliders
Player - all pixels that do not contain player index are replaced with white
Background - a not very successful try to show background only - pixels without player index are copied to background image and with player index are replaced with remembered background

How to process the images

It depend on your needs. As you can see, in my example, I choose the "iterative" method, because it is very simple to write and very clear to read. On the other hand, it has very poor performance.

As the depth frame can be treated as a grayscale image, you can achieve the same effects as in my example using filters easily found in all good image processing libraries - threshold and mask.

First, you have to decide what you really need. If you are building an augmented reality application, then you will need a high quality video and fast image blending. If you will analyse only part of the image from time to time (face recognition, for example), then you still need hi-res images, but not high a fps, and this means you can skip processing every frame in the event handler and get frames on demand.

As you can see from the previus sections, the Kinect SDK provides images in very raw format. This means it could be easily converted to anything you need. Most graphics libraries are able to take this raw array of bytes and create an internal image representation in the most efficient way.

Points of interest

If your needs are mostly image processing with depth map aid, you should stop here and look for some image processing library.

But if you really want to get the most of the Kinect NUI, go to the next big thing - the skeleton tracking engine.

History

2012-01-06: Initial submission.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here