Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / HPC

DirectX 11 Compute Shaders

4.58/5 (11 votes)
22 Feb 2013CPOL2 min read 97.5K   3K  
HPC via Compute Shaders (GPGPU).

Introduction

This article introduces GPGPU via DirectX11 Compute Shaders.

GPGPU (General-Purpose Computing on Graphics Processing Units) involves using graphics processing units to perform repeated calculation, utilizing the vast array of processing elements available on the GPU.

This article will demonstrate a very simple trigonometric calculation executed on the GPU.

Additional attached code shows the classic use GPGPU with square matrix squaring (multiplication) by spawning nrow*nrow number of GPU threads. This example is chosen since the output elements can be calculated independently.

Background

GPGPU has been around for more than a year with NVIDIA introducing CUDA, AMD introducing close to metal and AMD stream, and many other enthusiasts trying to use DirectX9 pixel shaders to achieve GPGPU.

Using the code

The attached code is compiled using VS2010 Beta 1 using libraries from DirectX SDK (August 2009) on Windows 7 RC. This code will not run on Windows XP since DirectX11 is not available for Windows XP. Some parts of the source code are picked up from DirectX SDK August 09 samples and adapted to suite the program.

The code starting point is Start(void*). The program is divided into the following sub parts:

Creation of a device (the easiest part)

Use D3D_DRIVER_TYPE_REFERENCE for emulation, and D3D_DRIVER_TYPE_HARDWARE to run code on GPU (you will require hardware support for this).

C++
D3D11CreateDevice( NULL,D3D_DRIVER_TYPE_REFERENCE/*D3D_DRIVER_TYPE_HARDWARE*/, 
  NULL, D3D11_CREATE_DEVICE_SINGLETHREADED|D3D11_CREATE_DEVICE_DEBUG, 
  NULL, 0,D3D11_SDK_VERSION, &pDeviceOut, &flOut, &pContextOut );

Load the GPU

The tough bit is the programmer must load the buffers to the GPU for processing. The attached source code will shed a lot more light on this:

C++
//for input buffer

HRESULT CreateStructuredBufferOnGPU( ID3D11Device* pDevice, 
        UINT uElementSize, UINT uCount, VOID* pInitData, 
        ID3D11Buffer** ppBufOut )
{

    *ppBufOut = NULL;
    D3D11_BUFFER_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );

    desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
    desc.ByteWidth = uElementSize * uCount;
    desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
    desc.StructureByteStride = uElementSize;

    if ( pInitData )
    {
    D3D11_SUBRESOURCE_DATA InitData;
    InitData.pSysMem = pInitData;
    return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
    }
    else
        return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

//for input buffer
HRESULT CreateBufferSRV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11ShaderResourceView** ppSRVOut )
{

    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );
    D3D11_SHADER_RESOURCE_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
    desc.BufferEx.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
        desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {

            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            desc.BufferEx.NumElements = 
               descBuf.ByteWidth / descBuf.StructureByteStride;
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateShaderResourceView( pBuffer, &desc, ppSRVOut );
}

//for output buffer    
HRESULT CreateBufferUAV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11UnorderedAccessView** ppUAVOut )
{
    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );

    D3D11_UNORDERED_ACCESS_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
    desc.Buffer.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        // Format must be DXGI_FORMAT_R32_TYPELESS,
        // when creating Raw Unordered Access View

        desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
        desc.Buffer.NumElements = descBuf.ByteWidth / 4; 
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {
            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            // Format must be must be DXGI_FORMAT_UNKNOWN,
            // when creating a View of a Structured Buffer

            desc.Buffer.NumElements = 
                 descBuf.ByteWidth / descBuf.StructureByteStride; 
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateUnorderedAccessView( pBuffer, &desc, ppUAVOut );
}

Run

This command dispatches the data to the processing elements available to the GPU, and its performance is directly related to the hardware and driver support (this is for the device created using D3D_DRIVER_TYPE_HARDWARE).

C++
pd3dImmediateContext->Dispatch( X, Y, Z );

Read output buffer

Earlier, using DirectX9, this part was the most painful bit, but with DirectX 11 Compute Shaders, this has become a lot easier.

First, create a temporary read buffer with the CPU access flag set to D3D11_CPU_ACCESS_READ. Then, copy the buffer, and map it to a pointer as shown below:

C++
pd3dImmediateContext->CopyResource( debugbuf, pBuffer );
BufType *p;
pContextOut->Map( debugbuf, 0, D3D11_MAP_READ, 0, &MappedResource );
p = (BufType*)MappedResource.pData; //p will hold the output buffer

Points of interest

With Compute Shaders, we can implement Physics based simulations involving liquids (probably my next project).

I have also implemented compute shader using Vulkan: https://bitbucket.org/asif_bahrainwala/matrix-multiply/src/master/

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)