Basic Video Capture and VMR9

WajihUllahBaig

4.83/5 (11 votes)

9 Apr 2009CPOL9 min read

98.9K

10.8K

Capturing video from webcam and VMR9 windowless rendering with DirectShow.

Introduction

To put it down in simple words, this small program will help to get a start on video capture from a web cam and allows windowless rendering using VMR9. At this point, I would like to say that the program reads all the filters available on a system; the user "must" select the appropriate filters in order to run the program correctly. Apart from what DirectShow help you can ever find on MSDN, I hope this small collection of code will allow you to get a feel of how things work in DirectShow as MSDN has help that is always like an abrupt stop. I have added screenshots so that using the program would be easier.

Background

STL, COM programming basics and DirectShow basics are required apart from MFC. If you have no idea on how filters are read and used, please do refer to my other articles which fully explain how filters can be used, plus an explanation on the BSTR_Compare(..) method is also available. The link to the topic, which is a three part tutorial, is given below:

Audio Capture with DirectShow - Part 1

Using the code

To use the code, please do make sure that you have changed the program paths for the include directories. In particular, paths to Windows SDK must be configured properly.

The classes

The program has been developed using classes. The following class diagram should give an idea of the contents. Not all methods are explained as some of them are pretty simple, i.e., for developers with a basic knowledge of DirectShow.

Explanation of classes

CMainGraph

This is the main class. The class holds the builder graph references, the capture graph references, and the methods to control the graph. Filters such as camera and VMR9 are added to this class object.

C++

class CMainGraph
{
public:
    CMainGraph(void);
    ~CMainGraph(void);
protected:
    // Main graph pointer
    IGraphBuilder* pGraph;
    // Main capture graph
    ICaptureGraphBuilder2* pCaptureGraph2;
    // System device enumerator
    ICreateDevEnum* pFilterEnum;
    // Device moniker
    IMoniker *pFilterMonik;
    // pointer to media playback control interface
    IMediaControl* pControl;
public:
    // Main COM initialization function
    void Init_COM(void);
    // Displays a message box when a COM error occurs
    void HR_Failed(HRESULT hr);
    // Run graph
    void Run_Graph();
    // Stop the main graph
    void Stop_Graph(void);
    // Get Device/Filter Enumerator
    ICreateDevEnum* Get_Enumerator(void);
    // Return the main graph pointer
    IGraphBuilder* Get_MainGraph_Ptr(void);
    // Get the main capture graph pointer
    ICaptureGraphBuilder2* Get_CaptureGraph_Ptr(void);
};

void Init_COM(void);: This method of the class initializes COM. It sets references to other interfaces such as the ICaptureGraphBuilder2, IMediaControl, etc., and queries the required interfaces in order to stop/run the graph.
ICreateDevEnum* Get_Enumerator(void);: This method returns a reference to ICreateDevEnum. The returned pointer is then passed on to the two other filters, camera and VMR9, that use the enumeration in order to instantiate the filters, or let's say the camera device and codec, respectively.
IGraphBuilder* Get_MainGraph_Ptr(void);: This method returns the pGraph. The objects inheriting this class will need to refer to the main graph pointer using this method.
ICaptureGraphBuilder2* Get_CaptureGraph_Ptr(void);: This method returns the ICaptureGraphBuilder2 reference. This interface is also very important in order to make use of the camera capture functionality.

CFilter

The CFilter class inherits from CMainGraph. The class holds fields that represent the filter name, a pointer to IBaseFilter, and IMoniker. This class is the parent class to the camera class and the VM9 class.

C++

class CFilter: 
    public CMainGraph
{
public:
    CFilter(void);
    CFilter(IMoniker*);
    ~CFilter(void);
protected:
    // FriendlyName as seen in graphedit.exe
    BSTR bstrFilterName;
    // Pointer to filter interface
    IBaseFilter* pFilter;
    // pointer to filter moniker 
    IMoniker* pFilterMoniker;
public:
    // Displays a message box when a COM error occurs
    void HR_Failed(HRESULT hr);
    //Compares two fitler names - true if filter found on system
    bool BSTR_Compare(BSTR bstrFilterName, BSTR bstrDeviceName);
    // Find pin by name
    IPin* Find_Pin(BSTR bstrPinName);
    // Find Pin
    IPin* Find_Pin(PIN_DIRECTION PinDir,IPin *pFilterPin);
    // Find a required pin
    IPin* Find_Pin(PIN_DIRECTION PIN_DIR, GUID PIN_CAT, GUID MEDIA_TYPE);
    // Filter initiating function
    IBaseFilter *Filter_Init(IMoniker*);
    //Function that connects two filter pins
    void Filter_Connect(IPin* pPinOut , IPin* pPinIn);
    // Function to add filter to main graph
    void Filter_Addto_Graph(IBaseFilter* pFilter,BSTR bstrName);
    // Set the main graph pointer
    void Set_MainGraph_Ptr(IGraphBuilder* pGraph);
    // Set main capture graph pointer
    void Set_CaptureGraph_Ptr(ICaptureGraphBuilder2* pCG);
    
};

IPin* Find_Pin(BSTR bstrPinName);: This is an overloaded method. While trying to find pins, we can use the "FriendlyName" of the pin. Once found, the pin is returned as the return pin is then further used to join the filter with either the cam or the video renderer depending upon the call.
IPin* Find_Pin(PIN_DIRECTION PIN_DIR,GUID PIN_CAT,GUIDE MEDIA_TYPE);: One of the most important methods that can be used apart from the other overloaded methods. The purpose of this method is to find a pin according to the direction of the pin, i.e., is it an incoming pin or an outgoing pin? If this is used with the camera filter and we try to find the "Capture" pin, then PIN_DIR would be equal to PINDIR_OUPUT. We can specify that PIN_CAT is equal to PIN_CATEGORY_CAPTURE, and the last argument can be used to specify if we have audio only, video only, or mixed; e.g., since we are using video only, MEDIA_TYPE will be equal to MEDIATYPE_Video, in this case.
IPin* Find_Pin(PIN_DIRECTION PinDir,IPin *pFilterPin);: The last overload method, which can be used to find a pin according to the direction; remember that the pin passed on here is returned after a successful instantiation.

All three methods almost use the same code to find the pins, i.e., by looping through the filter. Remember that these methods can only be called after a successful initiation of the filter. The code inside of the methods is like:

C++

HRESULT hr;
IEnumPins *pEPin = NULL;// Pin enumeration
IPin *pPin = NULL;// Pins
if (SUCCEEDED(this->pFilter->EnumPins(&pEPin)))
{
    while (hr = pEPin->Next(1, &pPin, 0), hr == S_OK)// loop through filter
    {
        //Get hold of the pin as seen in GraphEdit
        hr = pFilter->FindPin(bstrPinName,&pPin);
        if(SUCCEEDED(hr))
        {
            return pPin;
        }
    }    
}
return NULL;
void Set_CaptureGraph_Ptr(ICaptureGraphBuilder2* pCG);

IBaseFilter *Filter_Init(IMoniker*);: The filter is initialized by this method.
void Filter_Connect(IPin* pPinOut , IPin* pPinIn);: The filters that are connected is done by this method.
void Filter_Addto_Graph(IBaseFilter* pFilter,BSTR bstrName);: Adds the filter to the main graph, i.e., pGraph.
void Set_MainGraph_Ptr(IGraphBuilder* pGraph);: Since this also inherits the pGraph pointer, a reference from the main class CMainGraph's pGraph is set to the pGraph of this class. Make sure that both the camera and VMR9 filter are referring to the same pGraph pointer.
void Set_CaptureGraph_Ptr(ICaptureGraphBuilder2* pCG);: Same as Set_MainGraph_Ptr(..) explained above except for the pGraph pointer.

CFilterList

The CFilterList class keeps an STL list of the filters. When you run the program, you shall see that there are two combo boxes which hold the separate list of camera and other filters. So, why did I use STL lists? Well, for very obvious reasons of holding different types of information of the filters. But first, below is the listing of the class.

C++

class CFilterList
{
public:
    CFilterList(void);
    ~CFilterList(void);
public:
    // STL List to hold filters/device friendly names
    list<BSTR> listCamFilters;
    list<BSTR>::iterator iterCam;
    list<BSTR> listVRFilters;
    list<BSTR>::iterator iterVR;
    // STL list to hold monikers
    list<IMoniker*> pListCamFilterMoniker;
    list<IMoniker*>::iterator itermCam;
    list<IMoniker*> pListVRFilterMoniker;
    list<IMoniker*>::iterator itermVR;
    // Filter/Device reader
    void Filter_Read(GUID FILTER_CLSID,ICreateDevEnum* pFilterEnum);
    // Displays a message box when a COM error occurs
    void HR_Failed(HRESULT hr);
    //Compares two fitler names - true if filter found on system
    bool BSTR_Compare(BSTR bstrFilterName, BSTR bstrDeviceName);
};

The two separate STL lists hold two types of information related to the filters. The first list listCamFilters holds a type BSTR which is actually the friendly name of the camera filter. The second, listVRFilters, is the list holding the friendly names of the video renderer filters. The second type of lists pListCamFilterMoniker and pListVRFilterMoniker hold the list of monikers of the camera and video renderers. You can clearly deduce what the iterators for each list would be required for. So, why still use STL then? Well, once devices have been enumerated and monikers used, let's say we instantiate these devices (although instantiation is done by another method), I simply keep all these filters in STL lists. I use a BSTR type list to save the friendly names in the combo boxes while the IMoniker type STL list keeps a list of the monikers corresponding to each friendly name. Now, in the running program, when the list of video renderers is clicked, the friendly name from the list is picked up, and a search is initiated. While searching, the program also sifts through the STL list of monikers, on a 'true' from the BSTR_Compare(...), which means a filter was found, and a call to filter instantiation is made. The moniker from the STL is sent to the filter instantiation method. Though you would feel it is complicated, just try looking into the code below which is for the Video Renderer filters, and it would be easier to understand the rest of the code.

C++

void CVideoCaptureDlg::OnCbnSelchangeVrList()
{
    //Temporary listbox
    CComboBox *pComboVRFilter = static_cast<CComboBox*>(this->GetDlgItem(IDC_VR_LIST));
    int selectedIndex = pComboVRFilter->GetCurSel();
    CString strFilterName;
    pComboVRFilter->GetLBText(selectedIndex,strFilterName);
    //Find the required filter moniker
    FLObject.itermVR = FLObject.pListVRFilterMoniker.begin();
    BSTR temp = SysAllocString(strFilterName);
    for(
        FLObject.iterVR = FLObject.listVRFilters.begin();
        FLObject.iterVR != FLObject.listVRFilters.end();
        FLObject.iterVR++
        )
        {
            //check if there is a filter on the Video Renderers list for Video Render
            if((FLObject.BSTR_Compare(temp,*FLObject.iterVR)) == true)
            {
                //Initiate Filter
                VMR9Object.pVMR9 = VMR9Object.Filter_Init((*FLObject.itermVR));
                if(VMR9Object.pVMR9!=NULL)
                {
                    //Add to main graph
                    VMR9Object.Filter_Addto_Graph(VMR9Object.pVMR9,temp);
                    break;
                }
            }
            FLObject.itermVR++;
        }
    //Enable the connect filters button
    this->GetDlgItem(IDC_CONNECT_FILTERS)->EnableWindow(1);
}

void CFilterList::Filter_Read(GUID FILTER_CLSID,ICreateDevEnum* pFilterEnum): One of the most important methods that is used to read in filters whenever the 'Find Filters' button is pressed. This method starts filling in the STL lists, and part of the code that does that is shown below.

C++

if(SUCCEEDED(hr)) 
{

    //check device category
    if(FILTER_CLSID == CLSID_VideoInputDeviceCategory)
    {
        //store the moniker in the camera STL list
        listCamFilters.push_front(SysAllocString(varName.bstrVal));
    
        pListCamFilterMoniker.push_front(pDeviceMonik);
    }
    else
    {
        //store the moniker in the video renderer STL list
        listVRFilters.push_front(SysAllocString(varName.bstrVal));
        pListVRFilterMoniker.push_front(pDeviceMonik);
    
    }
}
else HR_Failed(hr);

CCamerFilter

The CCameraFilter is the smallest amongst the classes and is shown below:

C++

class CCameraFilter :
    public CFilter
{
public:
    CCameraFilter(void);
    ~CCameraFilter(void);
    // Cam Filter
    IBaseFilter* pCamFilter;
};

It has a pointer member of type IBaseFilter, which holds the reference to the camera. Since most of the functionality is defined in the CFilter class, there is nothing happening here.

CVMR9Filter

The CVMR9Filter class is the trickiest of all. Its listing follows.

C++

class CVMR9Filter :
    public CFilter
{
public:
    CVMR9Filter(void);
    ~CVMR9Filter(void);
    // VMR9 Filter
    IBaseFilter* pVMR9;
    // Set the VMR9 windowless mode
    void Set_Windowess_Mode(HWND hwndApp,LPRECT DrawRect);
    // Render stream for filters. i.e. connect
    void Filter_RenderStream(GUID PIN_TYPE,GUID MEDIA_TYPE,IBaseFilter*);
};

I should have named the class CVRFilter, but my initial intention was to use only VMR9 filters. I rather ended up making a general class, and just did not have the courage to change all the variables. So, please remember it is a more general class, but with one big exception, the Set_Windowless_Mode(...) method. This is the beauty of VMR9 which I have used, and thus I still claim that the class name is appropriate. The video that is captured is then rendered in a windowless mode, and rendered at a position defined by the coordinates of the group box I called IDC_VIDEO_FRAME. The position of this group box makes the rendering coordinates. The other interesting method is Filter_RenderStream(...). In general, connecting filters can be done using a simple way, and I have used the method by the name of Filter_Connect(). This method takes two pins and two filters and connects them. But, in the case of VMR9, we can use a built-in method RenderStream(NULL,NULL,src,NULL,dest);. This is a method exposed by the ICaptureBuilderGraph2 interface. The method can connect a cam filter 'src' to a video renderer 'dest' filter. I have used this method to connect the camera to the VMR9 filter. The code is shown below:

C++

// Render stream for filters. i.e. connect
void CVMR9Filter::Filter_RenderStream(GUID PIN_TYPE,GUID MEDIA_TYPE,IBaseFilter *pSrcFilter)
{
    HRESULT hr;
    hr = this->pCaptureGraph2->RenderStream(NULL,NULL,pSrcFilter,NULL,this->pFilter);
    if(SUCCEEDED(hr))
        {
        }
    else HR_Failed(hr);
}

We are using the Filter_RenderStream(...) method. It must equally work fine.

Where it all happens

Since the program is dialog based, most of the method calls is made from VideoCaptureDlg.cpp; this file contains the logic of calling methods that will instantiate filters, connect them, and start the video rendering. The main methods of the CVideoCaptureDlg class are as follows:

C++

// Find filters
afx_msg void OnBnClickedFindFilters();
// Connect filters
afx_msg void OnBnClickedConnectFilters();
// Camera list selected
afx_msg void OnCbnSelchangeCamList();
// Video Renderer list selected
afx_msg void OnCbnSelchangeVrList();
// Play/Stop
afx_msg void OnBnClickedPlaystopButton();

Although it is a very lengthy file, the most interesting is the event handler OnBnClickedConnectFilters(). I'll write it in steps:

This method starts by declaring pins that are used as inputs and outputs.
Camera is searched for a 'capture' pin.
VMR9 is searched for a 'VMR9 Input0' pin.
Coordinates of the group box are retrieved.
Render coordinates are set.
VMR9's windowless mode is set.
Filters are connected.
Finally, the graph is kick started.

Remember the overloaded function of Find_Pin(...)? Well, here is where you can use them, and I have already placed them in the code with comments.

Steps to run the program

Make sure you have the correct paths to the Windows SDK, and hit Run once presented with the GUI.

Step 1. Press "Find Filters".
Step 2. Select the correct cam and then the VMR9 Filter. Be careful at this step.
Step 3. Click "Connect".
Step 4. Hopefully you shall see the video.
Step 5. Try using the "Stop/Play" button.

Screenshots

A screenshot of the video of my dual displays.

Points of interest

The most interesting point? Well, if you could not make a reference to the VMR9 filter, the pointer to VMR9 will hold a zero, and if you use RenderStream(...) while having a correct reference to the cam, you shall still be able to render the video! But, not in the group box, rather your "third party" software would be called. It happened to me, and took me a week to understand the reason and fix it!

Code bugs

You will and shall find bugs! Did not take care of exceptions, so please note that!

The code is built with the following

Windows Server 2008
Microsoft Visual Studio 2008
DirectX 10.1
Microsoft Windows SDK 6.1

History

First post - 23/03/2009.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)