Introduction
After starting to use the Win8-Desktop, I found that some old technologies do not work well, especially DirectShow. For instance, capturing of live video from web-camera by DirectShow works perfect on WinXP, Vista, Win7, and allows to get the specific resolution. For example, from Microsoft Life Studio Web-Camera, I can get video with 1080p. However, on Win8-Desktop, I can get only a 640x480 video. The fact is that the function in the line code, which on Win7 returns HRESULT
- S_OK
returns FAILED
on Win8-Desktop. After reading information on MSDN, I have got an idea that Microsoft has made purpose to stop support of DirectShow and expand another technology - Media Foundation. I found some information about supporting of capturing of video from a web-camera by Media Foundation with the needed parameters, but this information is very dispersed on MSDN. I think, it would be useful to have only one C++ class which includes all procedures of initialization, but hides all of them and has a simple interface. I made it and present it in this tip.
Background
I lead the project of Augment Reality and I need simple support for capturing video from web-camera. I used the simple library videoInput
from the website http://muonics.net/school/spring05/videoInput/ which uses DirectShow for this purpose. However, it was not working well on a Win8-Desktop. I found that there is a problem with setting the resolution for capturing of the video. I found the solution to this problem by using Media Foundation, but my project used videoInput
and I thought that it would be useful to create a new library with the same interface as videoInput
, but which uses Media Foundation. So I got the needed purpose, and I think that my new library would be useful for other people who have faced the same problem in the process of development of the program of image recognition.
Using the Code
The library videoInput
was written in Visual Studio 2012 - videoInputVS2012.zip (static
library videoInput-staticlib-VS2012x86.zip) and includes nine classes:
videoInput
- class-interface. For using this library, it is enough to include videoInput.h and videoInput .lib in your project. This class is made as a singleton which makes managing of resources easy. Media_Foundation
- is a class singleton which manages the allocation and realizing of resources of Media Foundation. videoDevices
- is a class singleton which manages allocation and realizing of video devices and access to the separate video device. videoDevice
- is the class for manipulation of capturing of video device, getting raw data, checking a new frame, getting supported resolutions, setting the needed resolution, closing the video device. ImageGrabberThread
- is the class for manipulation of thread of the grabbing of the image. ImageGrabber
- is the class for initialization and grabbing images from the video device. It controls the process of grabbing and finishes it. RawImage
- is the temp class which contains for writing and reading one frame. FormatReading
- is class for reading information about supported resolution into customer's MediaType
. DebugPrintOut
- is the class for printing text into console.
It is enough to use the file videoInput.h as the interface of the library. Listing of it is presented below:
#pragma once
#include <guiddef.h>
struct IMFMediaSource;
struct MediaType
{
unsigned int MF_MT_FRAME_SIZE;
unsigned int height;
unsigned int width;
unsigned int MF_MT_YUV_MATRIX;
unsigned int MF_MT_VIDEO_LIGHTING;
unsigned int MF_MT_DEFAULT_STRIDE;
unsigned int MF_MT_VIDEO_CHROMA_SITING;
GUID MF_MT_AM_FORMAT_TYPE;
wchar_t *pMF_MT_AM_FORMAT_TYPEName;
unsigned int MF_MT_FIXED_SIZE_SAMPLES;
unsigned int MF_MT_VIDEO_NOMINAL_RANGE;
unsigned int MF_MT_FRAME_RATE;
unsigned int MF_MT_FRAME_RATE_low;
unsigned int MF_MT_PIXEL_ASPECT_RATIO;
unsigned int MF_MT_PIXEL_ASPECT_RATIO_low;
unsigned int MF_MT_ALL_SAMPLES_INDEPENDENT;
unsigned int MF_MT_FRAME_RATE_RANGE_MIN;
unsigned int MF_MT_FRAME_RATE_RANGE_MIN_low;
unsigned int MF_MT_SAMPLE_SIZE;
unsigned int MF_MT_VIDEO_PRIMARIES;
unsigned int MF_MT_INTERLACE_MODE;
unsigned int MF_MT_FRAME_RATE_RANGE_MAX;
unsigned int MF_MT_FRAME_RATE_RANGE_MAX_low;
GUID MF_MT_MAJOR_TYPE;
wchar_t *pMF_MT_MAJOR_TYPEName;
GUID MF_MT_SUBTYPE;
wchar_t *pMF_MT_SUBTYPEName;
MediaType();
~MediaType();
void Clear();
};
struct Parametr
{
long CurrentValue;
long Min;
long Max;
long Step;
long Default;
long Flag;
Parametr();
};
struct CamParametrs
{
Parametr Brightness;
Parametr Contrast;
Parametr Hue;
Parametr Saturation;
Parametr Sharpness;
Parametr Gamma;
Parametr ColorEnable;
Parametr WhiteBalance;
Parametr BacklightCompensation;
Parametr Gain;
Parametr Pan;
Parametr Tilt;
Parametr Roll;
Parametr Zoom;
Parametr Exposure;
Parametr Iris;
Parametr Focus;
};
class videoInput
{
public:
virtual ~videoInput(void);
static videoInput& getInstance();
void closeDevice(unsigned int deviceID);
void setEmergencyStopEvent(unsigned int deviceID, void *userData, void(*func)(int, void *));
void closeAllDevices();
CamParametrs getParametrs(unsigned int deviceID);
void setParametrs(unsigned int deviceID, CamParametrs parametrs);
unsigned int listDevices(bool silent = false);
unsigned int getCountFormats(unsigned int deviceID);
unsigned int getWidth(unsigned int deviceID);
unsigned int getHeight(unsigned int deviceID);
wchar_t *getNameVideoDevice(unsigned int deviceID);
IMFMediaSource *getMediaSource(unsigned int deviceID);
MediaType getFormat(unsigned int deviceID, int unsigned id);
bool isDevicesAcceable();
bool isDeviceSetup(unsigned int deviceID);
bool isDeviceMediaSource(unsigned int deviceID);
bool isDeviceRawDataSource(unsigned int deviceID);
void setVerbose(bool state);
bool setupDevice(unsigned int deviceID, unsigned int id = 0);
bool setupDevice(unsigned int deviceID, unsigned int w,
unsigned int h, unsigned int idealFramerate = 30);
bool isFrameNew(unsigned int deviceID);
bool getPixels(unsigned int deviceID, unsigned char * pixels,
bool flipRedAndBlue = false, bool flipImage = false);
private:
bool accessToDevices;
videoInput(void);
void processPixels(unsigned char * src, unsigned char * dst, unsigned int width,
unsigned int height, unsigned int bpp, bool bRGB, bool bFlip);
void updateListOfDevices();
};
This class can be used in one of two modes - RawData grabbing and MediaSource. If using only the first mode, there is no need to include the headers of Media Foundation and its libraries. In this case, the interface IMFMediaSource
in the method IMFMediaSource *getMediaSource(unsigned int deviceID)
returns NULL
and is predefined in videoInput.h. In the second mode, you can use the mentioned method and use it in your application as normal source of media data from the web-camera. The next listing shows how to use videoInput
in case of getting raw data of the frame. This example uses the OpenCV framework for presenting live video (this code TestVideoInputVS2012x86.zip, TestVideoInputVS2012x86-noexe.zip). This framework has its own function for capturing web-camera, but is based on DirectShow and on Win8-Desktop it has the mentioned problem. This example is presented on the next listing:
#include "stdafx.h"
#include "videoInput.h"
#include "highgui.h"
#pragma comment(lib, "lib\\opencv\\Release\\opencv_highgui242.lib")
#pragma comment(lib, "lib\\opencv\\Release\\opencv_core242.lib")
#pragma comment(lib, "videoInput.lib")
void StopEvent(int deviceID, void *userData)
{
videoInput *VI = &videoInput::getInstance();
VI->closeDevice(deviceID);
}
int _tmain(int argc, _TCHAR* argv[])
{
videoInput *VI = &videoInput::getInstance();
int i = VI->listDevices();
if(i > 0)
{
if(VI->setupDevice(i-1, 640, 480, 60))
{
VI->setEmergencyStopEvent(i - 1, NULL, StopEvent);
if(VI->isFrameNew(i-1))
{
int countLeftFrames = 0;
cvNamedWindow("VideoTest", CV_WINDOW_AUTOSIZE);
CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
IplImage* frame;
frame = cvCreateImage(size, 8,3);
while(1)
{
if(VI->isFrameNew(i-1))
{
VI->getPixels(i - 1, (unsigned char *)frame->imageData);
cvShowImage("VideoTest", frame);
countLeftFrames = 0;
}
else
countLeftFrames++;
char c = cvWaitKey(33);
if(c == 27)
break;
if(c == 49)
{
CamParametrs CP = VI->getParametrs(i-1);
CP.Brightness.CurrentValue = 128;
CP.Brightness.Flag = 1;
VI->setParametrs(i - 1, CP);
}
if(!VI->isDeviceSetup(i - 1))
{
break;
}
if(countLeftFrames > 60)
break;
}
VI->closeDevice(i - 1);
cvDestroyWindow("VideoTest");
}
}
}
if(VI->setupDevice(i-1, 1920, 1080, 60))
{
if(VI->isFrameNew(i-1))
{
int countLeftFrames = 0;
cvNamedWindow("VideoTest1", CV_WINDOW_AUTOSIZE);
CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
IplImage* frame;
frame = cvCreateImage(size, 8,3);
while(1)
{
if(VI->isFrameNew(i-1))
{
VI->getPixels(i - 1, (unsigned char *)frame->imageData,false);
cvShowImage("VideoTest1", frame);
countLeftFrames = 0;
}
else
countLeftFrames++;
char c = cvWaitKey(33);
if(c == 27)
break;
if(!VI->isDeviceSetup(i - 1))
{
break;
}
if(countLeftFrames > 60)
break;
}
VI->closeDevice(i - 1);
cvDestroyWindow("VideoTest1");
}
}
return 0;
}
In this code, the pointer on class videoInput
can be got by calling the method videoInput::getInstance()
. Before using camera, it needs to get the list of suitable devices using the function VI->listDevices()
. The device is initialized by calling the method VI->setupDevice(i-1, 640, 480, 60)
. There are two overloaded methods setupDevice
- setting the desired resolution and frames per second, and setting the number of needed type output. The first method finds the existent MediaType
with the needed parameters, or uses the default type with number 0
. Grabbing images from MediaSource
starts by first calling VI->isFrameNew(i-1)
. After calling this method, the raw data can be gotten by the method VI->getPixels(i - 1, (unsigned char *)frame->imageData,false)
. Parameters of the video camera can be got by calling the method VI->getParametrs(i-1)
. The new parameters can be set by the method VI->setParametrs(i - 1, CP)
. The method of closing of the device VI->closeDevice(i - 1)
stops the thread of grabbing and releases the context of the video device. The example shows fast using, stopping, and reusing of the same video device. The global function StopEvent(int deviceID, void *userData)
is used as a callback function in the method VI->setEmergencyStopEvent(i - 1, NULL, StopEvent)
. This function is called in the case of unexpected stopping - e.g., removing web-camera from the USB socket.
The second example is based on the SimpleCapture
example from the Windows SDK (this code - SimpleCaptureVS2012.zip, application - SimpleCapture-exe.zip). This example is too big for listing, but I can describe several differences from the original one. Firstly, I removed all the original linking for the web-camera and set the videoInput
library.
Secondly, I included the the second dialog for choosing a suitable resolution from the list of supported Media Types. It is important to mention that the interface IMFMediaSource
ought not be stopped manually. It is released by calling the function closeDevice(unsigned int deviceID)
.
Points of Interest
I have spent much time on searching for suitable information on the Microsoft website for developers and I have not gotten help from experts in that site. And I was not alone in searching for a solution for this problem. I was surprised that the problem of using a web-camera with Media Foundation was not presented, and I hope that my tip will become a useful contribution on this site.