Intel®Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for the Internet of Things, Android*, Intel® RealSense™ Technology and Windows* to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathons, contests, roadshows, and local events.
On March 20th, 2014, Microsoft announced DirectX* 12 at the Game Developers Conference. By reducing resource overhead, DirectX 12 will help applications run more efficiently, decreasing energy consumption and allow gamers to play longer on mobile devices.
At SIGGRAPH 2014 Intel measured the CPU power consumption when running a simple asteroids demo on a Microsoft Surface* Pro 3 tablet. The demo app can be switched from the DirectX 11 API to the DirectX 12 API by tapping a button. This demo draws a large number of asteroids in space at a locked framerate (https://software.intel.com/en-us/blogs/2014/08/11/siggraph-2014-directx-12-on-intel). It consumes less than half of the CPU power when driven by the DirectX 12 API compared to DirectX 11**, resulting in a cooler device with longer battery life. In a typical game scenario, any gains in CPU power can be invested in better physics, AI, pathfinding, or other CPU intense tasks making the game more feature rich or energy efficient.
Tools of the Trade
To develop games with DirectX 12, you need the following tools:
- Windows* 10 Technical Preview
- DirectX 12 SDK
- Visual Studio* 2013
- DirectX 12-capable GPU drivers
If you are a game developer, check out Microsoft’s DirectX Early Access Program at https://onedrive.live.com/survey?resid=A4B88088C01D9E9A!107&authkey=!AFgbVA2sYbeoepQ.
Set-up instructions for installing the SDK, and the GPU drivers are provided after your acceptance to the DirectX Early Access Program.
Overview
From a high-level point of view and compared to DirectX 10 and 11, the architecture of DirectX 12 differs in the areas of state management and the way resources are tracked and managed in memory.
DirectX 10 introduced state objects to set a group of states during run time. DirectX 12 introduces pipeline state objects (PSOs) used to set an even larger group of states along with shaders. This article focuses on the changes in dealing with resources and leaves the description of how states are grouped in PSOs to future articles.
In DirectX 11, the system was responsible for predicting or tracking resource usage patterns, which limited application design when using DirectX 11 on a broad scale. Basically, in DirectX 12, the programmer, not the system or driver, is responsible for handling the following three usage patterns:
- Binding of resources
DirectX 10 and 11 tracked the binding of resources to the graphics pipeline to keep resources alive that were already released by the application because they were still referenced by outstanding GPU work. DirectX 12 does not keep track of resource binding. The application, or in other words the programmer, must handle object lifetime management. - Inspection of resource bindings
DirectX 12 does not inspect resource bindings to know if or when a resource transition might have occurred. For example, an application might write into a render target via a render target view (RTV) and then read this render target as a texture via a shader resource view (SRV). With the DirectX 11 API, the GPU driver was expected to know when such a resource transition was happening to avoid memory read-modify-write hazards. In DirectX 12 you have to identify and track any resource transitions via dedicated API calls. - Synchronization of mapped memory
In DirectX 11, the driver handles synchronization of mapped memory between the CPU and GPU. The system inspected the resource bindings to understand if rendering needed to be delayed because a resource that was mapped for CPU access had not been unmapped yet. In DirectX 12, the application needs to handle synchronization of CPU and GPU access of resources. One mechanism to synchronize memory access is requesting an event to wake up a thread when the GPU finished processing.
Moving these resource usage patterns into the realm of the application required a new set of programming interfaces that can deal with a wide range of different GPU architectures.
The rest of this paper describes the new resource binding mechanisms, the first building block being descriptors.
Descriptors
Descriptors describe resources stored in memory. A descriptor is a block of data that describes an object to the GPU, in a GPU-specific opaque format. A simple way of thinking about descriptors is as a replacement of the old “view” system in DirectX 11. In addition to the different types of descriptors like Shader Resource View (SRV) and Unordered Access View (UAV) in DirectX 11, DirectX 12 has other types of descriptors like Samplers and Constant Buffer Views (CBVs).
For example, an SRV selects which underlying resource to use, what set of mipmaps / array slices to use, and the format to interpret the memory. An SRV descriptor must contain the GPU virtual address of the Direct3D* resource, which might be a texture. The application must ensure that the underlying resource is not already destroyed or inaccessible because it is nonresident.
Figure 1 shows a descriptor that represents a “view” into a texture:
Figure 1. A shader resource view described in a descriptor [Used with permission, Copyright © Microsoft]
To create a shader resource view in DirectX 12, use the following structure and Direct3D device method:
typedef struct D3D12_SHADER_RESOURCE_VIEW_DESC
{
DXGI_FORMAT Format;
D3D12_SRV_DIMENSION ViewDimension;
union
{
D3D12_BUFFER_SRV Buffer;
D3D12_TEX1D_SRV Texture1D;
D3D12_TEX1D_ARRAY_SRV Texture1DArray;
D3D12_TEX2D_SRV Texture2D;
D3D12_TEX2D_ARRAY_SRV Texture2DArray;
D3D12_TEX2DMS_SRV Texture2DMS;
D3D12_TEX2DMS_ARRAY_SRV Texture2DMSArray;
D3D12_TEX3D_SRV Texture3D;
D3D12_TEXCUBE_SRV TextureCube;
D3D12_TEXCUBE_ARRAY_SRV TextureCubeArray;
D3D12_BUFFEREX_SRV BufferEx;
};
} D3D12_SHADER_RESOURCE_VIEW_DESC;
interface ID3D12Device
{
...
void CreateShaderResourceView (
_In_opt_ ID3D12Resource* pResource,
_In_opt_ const D3D12_SHADER_RESOURCE_VIEW_DESC* pDesc,
_In_ D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptor);
};
Example code for an SRV might look like this:
D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc;
ZeroMemory(&srvDesc, sizeof(D3D12_SHADER_RESOURCE_VIEW_DESC));
srvDesc.Format = mTexture->Format;
srvDesc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
srvDesc.Texture2D.MipLevels = 1;
mDevice->CreateShaderResourceView(mTexture.Get(), &srvDesc, mCbvSrvDescriptorHeap->GetCPUDescriptorHandleForHeapStart());
This code creates an SRV for a 2D texture and specifies its format and the GPU virtual address. The last argument to CreateShaderResourceView
is a handle to what is called a descriptor heap that was allocated before calling this method. Descriptors are generally stored in descriptor heaps, detailed in the next section.
Note: It is also possible to pass some types of descriptors to the GPU through driver-versioned memory called root parameters. More on this later.
Descriptor Heaps
A descriptor heap can be thought of as one memory allocation for a number of descriptors. Different types of descriptor heaps can contain one or several types of descriptors. Here are the types currently supported:
Typedef enum D3D12_DESCRIPTOR_HEAP_TYPE
{
D3D12_CBV_SRV_UAV_DESCRIPTOR_HEAP = 0,
D3D12_SAMPLER_DESCRIPTOR_HEAP = (D3D12_CBV_SRV_UAV_DESCRIPTOR_HEAP + 1) ,
D3D12_RTV_DESCRIPTOR_HEAP = ( D3D12_SAMPLER_DESCRIPTOR_HEAP + 1 ) ,
D3D12_DSV_DESCRIPTOR_HEAP = ( D3D12_RTV_DESCRIPTOR_HEAP + 1 ) ,
D3D12_NUM_DESCRIPTOR_HEAP_TYPES = ( D3D12_DSV_DESCRIPTOR_HEAP + 1 )
} D3D12_DESCRIPTOR_HEAP_TYPE;
There is a descriptor heap type for CBVs, SRVs, and UAVs. There are also types that deal with render target view (RTV) and depth stencil view (DSV).
The following code creates a descriptor heap for nine descriptors—each one can be a CBV, SRV, or UAV:
D3D12_DESCRIPTOR_HEAP_DESC descHeapCbvSrv = {};
descHeapCbvSrv.NumDescriptors = 9;
descHeapCbvSrv.Type = D3D12_CBV_SRV_UAV_DESCRIPTOR_HEAP;
descHeapCbvSrv.Flags = D3D12_DESCRIPTOR_HEAP_SHADER_VISIBLE;
ThrowIfFailed(mDevice->CreateDescriptorHeap(&descHeapCbvSrv, __uuidof(ID3D12DescriptorHeap), (void**)&mCbvSrvDescriptorHeap));
The first two entries in the descriptor heap description are the number of descriptors and the type of descriptors that are allowed in this descriptor heap. The third parameter D3D12_DESCRIPTOR_HEAP_SHADER_VISIBLE
describes this descriptor heap as visible to a shader. Descriptor heaps that are not visible to a shader can be used, for example, for staging descriptors on the CPU or for RTV that are not selectable from within shaders.
Although this code sets the flag that makes the descriptor heap visible to a shader, there is one more level of indirection. A shader can “see” a descriptor heap through a descriptor table (there are also root descriptors that do not use tables; more on this later).
Descriptor Tables
The primary goal with a descriptor heap is to allocate as much memory as necessary to store all the descriptors for as much rendering as possible, perhaps a frame or more
Note: Switching descriptor heaps might—depending on the underlying hardware—result in flushing the GPU pipeline. Therefore switching descriptor heaps should be minimized or paired with other operations that would flush the graphics pipeline anyway.
A descriptor table offsets into the descriptor heap. Instead of forcing the graphics pipeline to always view the entire heap, switching descriptor tables is an inexpensive way to change a set of resources a given shader uses. This way the shader does not have to understand where to find resources in heap space.
In other words, an application can utilize several descriptor tables that index the same descriptor heap for different shaders as shown in Figure 2:
Figure 2. Different shaders index into the descriptor heap with different descriptor tables
Descriptor tables for an SRV and a sampler are created in the following code snippet with visibility for a pixel shader.
D3D12_DESCRIPTOR_RANGE descRange[2];
descRange[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 0);
descRange[1].Init(D3D12_DESCRIPTOR_RANGE_SAMPLER, 1, 0);
D3D12_ROOT_PARAMETER rootParameters[2];
rootParameters[0].InitAsDescriptorTable(1, &descRange[0], D3D12_SHADER_VISIBILITY_PIXEL);
rootParameters[1].InitAsDescriptorTable(1, &descRange[1], D3D12_SHADER_VISIBILITY_PIXEL);
The visibility of the descriptor table is restricted to the pixel shader by providing the D3D12_SHADER_VISIBILITY_PIXEL
flag. The following enum defines different levels of visibility of a descriptor table:
typedef enum D3D12_SHADER_VISIBILITY
{
D3D12_SHADER_VISIBILITY_ALL = 0,
D3D12_SHADER_VISIBILITY_VERTEX = 1,
D3D12_SHADER_VISIBILITY_HULL = 2,
D3D12_SHADER_VISIBILITY_DOMAIN = 3,
D3D12_SHADER_VISIBILITY_GEOMETRY = 4,
D3D12_SHADER_VISIBILITY_PIXEL = 5
} D3D12_SHADER_VISIBILITY;
Providing a flag that sets the visibility to all will broadcast the arguments to all shader stages, although it is only set once.
A shader can locate a resource through descriptor tables, but the descriptor tables need to be made known to this shader first as a root parameter in a root signature.
Root Signature and Parameters
A root signature stores root parameters that are used by shaders to locate the resources they need access to. These parameters exist as a binding space on a command list for the collection of resources the application needs to make available to shaders.
The root arguments can be:
- Descriptor tables: As described above, they hold an offset plus the number of descriptors into the descriptor heap.
- Root descriptors: Only a small amount of descriptors can be stored directly in a root parameter. This saves the application the effort to put those descriptors into a descriptor heap and removes an indirection.
- Root constants: Those are constants provided directly to the shaders without having to go through root descriptors or descriptor tables.
To achieve optimal performance, applications should generally sort the layout of the root parameters in decreasing order of change frequency.
All the root parameters like descriptor tables, root descriptors, and root constants are baked in to a command list and the driver will be versioning them on behalf of the application. In other words, whenever any of the root parameters change between draw or dispatch calls, the hardware will update the version number of the root signature. Every draw / dispatch call gets a unique full set of root parameter states when any argument changes.
Root descriptors and root constants decrease the level of GPU indirection when accessed, while descriptor tables allow accessing a larger amount of data but incur the cost of the increased level of indirection. Because of the higher level of indirection, with descriptor tables the application can initialize content up until it submits the command list for execution. Additionally, shader model 5.1, which is supported by all DirectX 12 hardware, offers shaders to dynamically index into any given descriptor table. So a shader can select which descriptor it wants out of a descriptor table at shader execution time. An application could just create one large descriptor table and always use indexing (via something like a material ID) to get the desired descriptor.
Different hardware architectures will show different performance tradeoffs between using large sets of root constants and root descriptors versus using descriptor tables. Therefore it will be necessary to tune the ratio between root parameters and descriptor tables depending on the hardware target platforms.
A perfectly reasonable outcome for an application might be a combination of all types of bindings: root constants, root descriptors, descriptor tables for descriptors gathered on-the-fly as draw calls are issued, and dynamic indexing of large descriptor tables.
The following code stores the two descriptor tables mentioned above as root parameters in a root signature.
D3D12_DESCRIPTOR_RANGE descRange[2];
descRange[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 0);
descRange[1].Init(D3D12_DESCRIPTOR_RANGE_SAMPLER, 1, 0);
D3D12_ROOT_PARAMETER rootParameters[2];
rootParameters[0].InitAsDescriptorTable(1, &descRange[0], D3D12_SHADER_VISIBILITY_PIXEL);
rootParameters[1].InitAsDescriptorTable(1, &descRange[1], D3D12_SHADER_VISIBILITY_PIXEL);
D3D12_ROOT_SIGNATURE descRootSignature;
descRootSignature.Init(2, rootParameters, 0);
ComPtr<ID3DBlob> pOutBlob;
ComPtr<ID3DBlob> pErrorBlob;
ThrowIfFailed(D3D12SerializeRootSignature(&descRootSignature,
D3D_ROOT_SIGNATURE_V1, pOutBlob.GetAddressOf(),
pErrorBlob.GetAddressOf()));
ThrowIfFailed(mDevice->CreateRootSignature(pOutBlob->GetBufferPointer(),
pOutBlob->GetBufferSize(), __uuidof(ID3D12RootSignature),
(void**)&mRootSignature));
All shaders in a PSO need to be compatible with the root signature specified with this PSO; otherwise, the PSO won’t be created.
A root signature needs to be set on a command list or bundle. This is done by calling:
commandList->SetGraphicsRootSignature(mRootSignature);
After setting the root signature, the set of bindings needs to be defined. In the example above this would be done with the following code:
commandList->SetGraphicsRootDescriptorTable(0,
mCbvSrvDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
commandList->SetGraphicsRootDescriptorTable(1,
mSamplerDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
The application must set the appropriate parameters in each of the two slots in the root signature before issuing a draw call or a dispatch call. In this example, the first slot now holds a descriptor handle that indexes into the descriptor heap to a SRV descriptor and the second slot now holds a descriptor table that indexes into the descriptor heap to a sampler descriptor.
An application can change, for example, the binding on the second slot between draw calls. That means it only has to bind the second slot for the second draw call.
Putting it all together
The large source code snippet below shows all the mechanisms used to bind resources. This application only uses one texture, and this code provides a sampler and an SRV for this texture:
D3D12_DESCRIPTOR_RANGE descRange[2];
descRange[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 0);
descRange[1].Init(D3D12_DESCRIPTOR_RANGE_SAMPLER, 1, 0);
D3D12_ROOT_PARAMETER rootParameters[2];
rootParameters[0].InitAsDescriptorTable(1, &descRange[0], D3D12_SHADER_VISIBILITY_PIXEL);
rootParameters[1].InitAsDescriptorTable(1, &descRange[1], D3D12_SHADER_VISIBILITY_PIXEL);
D3D12_ROOT_SIGNATURE descRootSignature;
descRootSignature.Init(2, rootParameters, 0);
ComPtr<ID3DBlob> pOutBlob;
ComPtr<ID3DBlob> pErrorBlob;
ThrowIfFailed(D3D12SerializeRootSignature(&descRootSignature,
D3D_ROOT_SIGNATURE_V1, pOutBlob.GetAddressOf(),
pErrorBlob.GetAddressOf()));
ThrowIfFailed(mDevice->CreateRootSignature(pOutBlob->GetBufferPointer(),
pOutBlob->GetBufferSize(), __uuidof(ID3D12RootSignature),
(void**)&mRootSignature));
D3D12_DESCRIPTOR_HEAP_DESC descHeapCbvSrv = {};
descHeapCbvSrv.NumDescriptors = 1; descHeapCbvSrv.Type = D3D12_CBV_SRV_UAV_DESCRIPTOR_HEAP;
descHeapCbvSrv.Flags = D3D12_DESCRIPTOR_HEAP_SHADER_VISIBLE;
ThrowIfFailed(mDevice->CreateDescriptorHeap(&descHeapCbvSrv, __uuidof(ID3D12DescriptorHeap), (void**)&mCbvSrvDescriptorHeap));
D3D12_DESCRIPTOR_HEAP_DESC descHeapSampler = {};
descHeapSampler.NumDescriptors = 1;
descHeapSampler.Type = D3D12_SAMPLER_DESCRIPTOR_HEAP;
descHeapSampler.Flags = D3D12_DESCRIPTOR_HEAP_SHADER_VISIBLE;
ThrowIfFailed(mDevice->CreateDescriptorHeap(&descHeapSampler, __uuidof(ID3D12DescriptorHeap), (void**)&mSamplerDescriptorHeap));
D3D12_SAMPLER_DESC samplerDesc;
ZeroMemory(&samplerDesc, sizeof(D3D12_SAMPLER_DESC));
samplerDesc.Filter = D3D12_FILTER_MIN_MAG_MIP_LINEAR;
samplerDesc.AddressU = D3D12_TEXTURE_ADDRESS_WRAP;
samplerDesc.AddressV = D3D12_TEXTURE_ADDRESS_WRAP;
samplerDesc.AddressW = D3D12_TEXTURE_ADDRESS_WRAP;
samplerDesc.MinLOD = 0;
samplerDesc.MaxLOD = D3D11_FLOAT32_MAX;
samplerDesc.MipLODBias = 0.0f;
samplerDesc.MaxAnisotropy = 1;
samplerDesc.ComparisonFunc = D3D12_COMPARISON_ALWAYS;
mDevice->CreateSampler(&samplerDesc,
mSamplerDescriptorHeap->GetCPUDescriptorHandleForHeapStart());
D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc;
ZeroMemory(&srvDesc, sizeof(D3D12_SHADER_RESOURCE_VIEW_DESC));
srvDesc.Format = SampleAssets::Textures->Format;
srvDesc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
srvDesc.Texture2D.MipLevels = 1;
mDevice->CreateShaderResourceView(mTexture.Get(), &srvDesc,
mCbvSrvDescriptorHeap->GetCPUDescriptorHandleForHeapStart());
commandList->SetGraphicsRootSignature(mRootSignature);
commandList->SetGraphicsRootDescriptorTable(0,
mCbvSrvDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
commandList->SetGraphicsRootDescriptorTable(1,
mSamplerDescriptorHeap->GetGPUDescriptorHandleForHeapStart());
Static Samplers
Now that you’ve seen how to create a sampler using a descriptor heap and a descriptor table, there is another way to use samplers in applications. Because many applications only need a fixed set of samplers, it is possible to use static samplers as a root argument.
Currently, the root signature looks like this:
typedef struct D3D12_ROOT_SIGNATURE
{
UINT NumParameters;
const D3D12_ROOT_PARAMETER* pParameters;
UINT NumStaticSamplers;
const D3D12_STATIC_SAMPLER* pStaticSamplers;
D3D12_ROOT_SIGNATURE_FLAGS Flags;
void Init(
UINT numParameters,
const D3D12_ROOT_PARAMETER* _pParameters,
UINT numStaticSamplers = 0,
const D3D12_STATIC_SAMPLER* _pStaticSamplers = NULL,
D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_NONE)
{
NumParameters = numParameters;
pParameters = _pParameters;
NumStaticSamplers = numStaticSamplers;
pStaticSamplers = _pStaticSamplers;
Flags = flags;
}
D3D12_ROOT_SIGNATURE() { Init(0,NULL,0,NULL,D3D12_ROOT_SIGNATURE_NONE);}
D3D12_ROOT_SIGNATURE(
UINT numParameters,
const D3D12_ROOT_PARAMETER* _pParameters,
UINT numStaticSamplers = 0,
const D3D12_STATIC_SAMPLER* _pStaticSamplers = NULL,
D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_NONE)
{
Init(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
}
} D3D12_ROOT_SIGNATURE;
A set of static samplers can be defined independently of the root parameters in a root signature. As mentioned above, root parameters define a binding space where arguments can be provided at run time, whereas static samplers are by definition unchanging.
Since root signatures can be authored in HLSL, static samplers can be authored with it as well. For now, an application can only have a maximum of 2032 unique static samplers. This is slightly less than the next power of two and allows drivers to use some of the slots for internal use.
The static samplers defined in a root signature are independent of samplers an application chooses to put in a descriptor heap, so both mechanisms can be used at the same time.
If the selection of samplers is truly dynamic and unknown at shader compile time, an application should manage samplers in a descriptor heap.
Conclusion
DirectX 12 offers full control over resource usage patterns. The application developer is responsible for allocating memory in descriptor heaps, describing the resources in descriptors, and letting the shader “index” into descriptor heaps via descriptor tables that are made “known” to the shader via root signatures.
Furthermore, root signatures can be used to define a custom parameter space for shaders using any combination of four options:
- root constants
- static samplers
- root descriptors
- descriptor tables
In the end, the challenge is to pick the most desirable form of binding for the types of resources and their frequency of update.
About the Author
Wolfgang is the CEO of Confetti. Confetti is a think-tank for advanced real-time graphics research and a service provider for the video game and movie industry. Before co-founding Confetti, Wolfgang worked as the lead graphics programmer in Rockstar's core technology group RAGE for more than four years. He is the founder and editor of the ShaderX and GPU Pro books series, a Microsoft MVP, the author of several books and articles on real-time rendering and a regular contributor to websites and the GDC. One of the books he edited -ShaderX4- won the Game developer Front line award in 2006. Wolfgang is in many advisory boards throughout the industry; one of them is the Microsoft’s Graphics Advisory Board for DirectX 12. He is an active contributor to several future standards that drive the Game Industry. You can find him on twitter at: wolfgangengel. Confetti's website is www.conffx.com.
Acknowledgement
I would like to thank Chas Boyd, Amar Patel, and David Reinig for their proofreading and feedback.
References and Related Links
** Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.