Disclaimer: This article is a repost of material originally published on this page on Diligent Engine web site.
Background
This article is not an introduction to descriptor heaps in D3D12. Though we will give a brief description of what descriptor heaps are, it is assumed that the reader has an understanding of basic D3D12 concepts. The system described below uses Simple Variable-Size Memory Block Allocator and is related to the resource binding model presented in this post.
Introduction
Resource descriptors and descriptor heaps are key concepts of a new resource binding model introduced in Direct3D12. A descriptor is a small block of data that fully describes an object to the GPU, in a GPU specific opaque format. Descriptor heap is essentially an array of descriptors. Every pipeline state incorporates a root signature that defines how shader registers are mapped to the descriptors in the bound descriptor heaps. Resource binding is a two-stage process: shader register is first mapped to the descriptor in a descriptor heap as defined by the root signature. The descriptor (which may be SRV, UAV, CBV or Sampler) then references the resource in GPU memory. The picture below illustrates a simplified view of the D3D12 resource binding model.
There are four types of descriptor heaps in D3D12:
- Constant Buffer/Shader Resource/Unordered Access view (
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV
) - Sampler (
D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER
) - Render Target View (
D3D12_DESCRIPTOR_HEAP_TYPE_RTV
) - Depth Stencil View (
D3D12_DESCRIPTOR_HEAP_TYPE_DSV
)
For GPU to be able to access descriptors in the heap, the heap must be shader-visisble. Only the first two heap types (CBV_SRV_UAV
and SAMPLER
) can be shader visible. RTV and DSV heaps are only CPU-visible. The size of the CPU-only descriptor heap is only limited by the available CPU memory. The size of the shader-visible descriptor heap has more strict limitations. While CBV_SRV_UAV
heap can hold as many as 1,000,000 descriptors or more, the maximum number of samplers in a shader-visible descriptor heap is only 2048 (see D3D12 Hardware Tiers on MSDN). As a result, not all descriptor handles can be stored in a shader-visible descriptor heap, and it is responsibility of D3D12 application to make sure that all descriptor handles required for rendering are in GPU-visible heaps. This article describes a descriptor heap management system implemented in Diligent Engine 2.0.
Overview
Descriptor heap management system in Diligent Engine consists of five main classes:
DescriptorHeapAllocation
is a helper class that represents descriptor heap allocation, which is simply a range of descriptors DescriptorHeapAllocationManager
is the main workhorse class that manages allocations in D3D12 descriptor heap using variable-size GPU allocations manager CPUDescriptorHeap
implements CPU-only descriptor heap that is used as a storage of resource view descriptor handles GPUDescriptorHeap
implements shader-visible descriptor heap that holds descriptor handles used by the GPU commands DynamicSuballocationsManager
is responsible for allocating short-living dynamic descriptor handles used in the current frame only
Each class as well as their interactions will be described in details below.
Descriptor Heap Allocation
DescriptorHeapAllocation
, the first class used by the Diligent Engine descriptor heap management system, represents a descriptor heap allocation. It can be initialized as a single descriptor or as a continuous range of descriptors in the specified heap.
Note that the descriptor heap allocation only references a range in the heap. It contains the first CPU handle in CPU virtual address space, and, if the heap is shader-visible, the first GPU handle in GPU virtual address space. The class prohibits copies and only allows transfer of ownership through move semantics. The class is defined as shown below:
class DescriptorHeapAllocation
{
public:
DescriptorHeapAllocation();
DescriptorHeapAllocation( IDescriptorAllocator *pAllocator,
ID3D12DescriptorHeap *pHeap,
D3D12_CPU_DESCRIPTOR_HANDLE CpuHandle,
D3D12_GPU_DESCRIPTOR_HANDLE GpuHandle,
Uint32 NHandles,
Uint16 AllocationManagerId );
DescriptorHeapAllocation(DescriptorHeapAllocation &&Allocation);
DescriptorHeapAllocation& operator = (DescriptorHeapAllocation &&Allocation);
~DescriptorHeapAllocation()
{
if(!IsNull() && m_pAllocator)
m_pAllocator->Free(std::move(*this));
}
D3D12_CPU_DESCRIPTOR_HANDLE GetCpuHandle(Uint32 Offset = 0) const
{
D3D12_CPU_DESCRIPTOR_HANDLE CPUHandle = m_FirstCpuHandle;
if (Offset != 0)
CPUHandle.ptr += m_DescriptorSize * Offset;
return CPUHandle;
}
D3D12_GPU_DESCRIPTOR_HANDLE GetGpuHandle(Uint32 Offset = 0) const
{
D3D12_GPU_DESCRIPTOR_HANDLE GPUHandle = m_FirstGpuHandle;
if (Offset != 0)
GPUHandle.ptr += m_DescriptorSize * Offset;
return GPUHandle;
}
ID3D12DescriptorHeap *GetDescriptorHeap(){return m_pDescriptorHeap;}
size_t GetNumHandles()const{return m_NumHandles;}
bool IsNull() const { return m_FirstCpuHandle.ptr == 0; }
bool IsShaderVisible() const { return m_FirstGpuHandle.ptr != 0; }
size_t GetAllocationManagerId(){return m_AllocationManagerId;}
UINT GetDescriptorSize()const{return m_DescriptorSize;}
private:
DescriptorHeapAllocation(const DescriptorHeapAllocation&) = delete;
DescriptorHeapAllocation& operator= (const DescriptorHeapAllocation&) = delete;
D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCpuHandle = {0};
D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGpuHandle = {0};
IDescriptorAllocator* m_pAllocator = nullptr;
ID3D12DescriptorHeap* m_pDescriptorHeap = nullptr;
Uint32 m_NumHandles = 0;
Uint16 m_AllocationManagerId = static_cast<Uint16>(-1);
Uint16 m_DescriptorSize = 0;
};
One field that requires some clarification is m_AllocationManagerId
. As we will discuss later, a descriptor heap object may contain several allocation managers. This field is used to identify the manager within the descriptor heap that was used to create this allocation.
Descriptor Heap Allocation Manager
Second class that constitutes descriptor heap management system is DescriptorHeapAllocationManager
. This class uses variable-size GPU allocations manager to handle allocations within the descriptor heap.
Every allocation that the class creates is represented by an instance of DescriptorHeapAllocation
class. The list of free descriptors is managed by m_FreeBlocksManager
member. The class declaration is given in the listing below:
class DescriptorHeapAllocationManager
{
public:
DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
RenderDeviceD3D12Impl *pDeviceD3D12Impl,
IDescriptorAllocator *pParentAllocator,
size_t ThisManagerId,
const D3D12_DESCRIPTOR_HEAP_DESC &HeapDesc);
DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
RenderDeviceD3D12Impl *pDeviceD3D12Impl,
IDescriptorAllocator *pParentAllocator,
size_t ThisManagerId,
ID3D12DescriptorHeap *pd3d12DescriptorHeap,
Uint32 FirstDescriptor,
Uint32 NumDescriptors);
DescriptorHeapAllocationManager(DescriptorHeapAllocationManager&& rhs);
DescriptorHeapAllocationManager& operator = (DescriptorHeapAllocationManager&& rhs) = delete;
DescriptorHeapAllocationManager(const DescriptorHeapAllocationManager&) = delete;
DescriptorHeapAllocationManager& operator = (const DescriptorHeapAllocationManager&) = delete;
~DescriptorHeapAllocationManager();
DescriptorHeapAllocation Allocate( uint32_t Count );
void Free(DescriptorHeapAllocation&& Allocation);
void ReleaseStaleAllocations(Uint64 NumCompletedFrames);
size_t GetNumAvailableDescriptors()const{return m_FreeBlockManager.GetFreeSize();}
private:
VariableSizeGPUAllocationsManager m_FreeBlockManager;
D3D12_DESCRIPTOR_HEAP_DESC m_HeapDesc;
CComPtr<ID3D12DescriptorHeap> m_pd3d12DescriptorHeap;
D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCPUHandle = {0};
D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGPUHandle = {0};
UINT m_DescriptorSize = 0;
Uint32 m_NumDescriptorsInAllocation = 0;
std::mutex m_AllocationMutex;
RenderDeviceD3D12Impl *m_pDeviceD3D12Impl = nullptr;
IDescriptorAllocator *m_pParentAllocator = nullptr;
size_t m_ThisManagerId = static_cast<size_t>(-1);
};
The class provides two constructors. The first constructor creates a new D3D12 descriptor heap and address the entire available space. The second constructor uses subrange of descriptors in an existing D3D12 heap. This allows a number of allocation managers to share the same D3D12 descriptor heap, which is essential for GPU-visible heaps.
Allocation routine uses DescriptorHeapAllocationManager::Allocate() to allocate the requested number of descriptors in the heap and returns DescriptorHeapAllocation
object representing the allocation.
DescriptorHeapAllocation DescriptorHeapAllocationManager::Allocate(uint32_t Count)
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
auto DescriptorHandleOffset = m_FreeBlockManager.Allocate(Count);
if (DescriptorHandleOffset == VariableSizeGPUAllocationsManager::InvalidOffset)
return DescriptorHeapAllocation();
auto CPUHandle = m_FirstCPUHandle;
CPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;
auto GPUHandle = m_FirstGPUHandle;
if(m_HeapDesc.Flags & D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE)
GPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;
return DescriptorHeapAllocation( m_pParentAllocator, m_pd3d12DescriptorHeap,
CPUHandle, GPUHandle, Count,
static_cast<Uint16>(m_ThisManagerId) );
}
Similarly, deallocation routine takes DescriptorHeapAllocation
object and uses DescriptorHeapAllocationManager::Free() to release the allocation. Note that since GPU commands are executed asynchronously, the allocation cannot be released immediately. Instead, the manager adds it to the queue along with the current frame number and releases all stale allocations later when the frame is completed by the GPU (which is detected by a signaled fence).
void DescriptorHeapAllocationManager::Free(DescriptorHeapAllocation&& Allocation)
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
auto DescriptorOffset = (Allocation.GetCpuHandle().ptr - m_FirstCPUHandle.ptr) / m_DescriptorSize;
m_FreeBlockManager.Free(DescriptorOffset, Allocation.GetNumHandles(),
m_pDeviceD3D12Impl->GetCurrentFrame());
Allocation = DescriptorHeapAllocation();
}
ReleaseStaleAllocations()
method must be called at the end of every frame to actually release all stale allocations from previous frames:
void DescriptorHeapAllocationManager::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
m_FreeBlockManager.ReleaseCompletedFrames(NumCompletedFrames);
}
CPU Descriptor Heap
The next part of the descriptor heap management system is CPU descriptor heap. CPU descriptor heaps are used by the engine to store resource views when a new resource is created. Since there are total four descriptor heap types, the system maintains four CPUDescriptorHeap
instances (the heaps are part of the render device). Every CPU descriptor heap keeps a pool of Descriptor Heap Allocation Managers and a list of managers that have unused descriptors:
std::vector<DescriptorHeapAllocationManager> m_HeapPool;
std::set<size_t> m_AvailableHeaps;
The following figure gives an example of the contents of the CPU descriptor heap object:
When allocating a new descriptor, the CPUDescriptorHeap
class goes through the list of managers that have available descriptors and tries to process the request using every manager. If there are no available managers or no manager was able to handle the request, the function creates a new descriptor heap manager and lets it handles the request. The source code of the allocation function is given in the listing below:
DescriptorHeapAllocation CPUDescriptorHeap::Allocate( uint32_t Count )
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
DescriptorHeapAllocation Allocation;
for (auto AvailableHeapIt = m_AvailableHeaps.begin(); AvailableHeapIt != m_AvailableHeaps.end();
++AvailableHeapIt)
{
Allocation = m_HeapPool[*AvailableHeapIt].Allocate(Count);
if(m_HeapPool[*AvailableHeapIt].GetNumAvailableDescriptors() == 0)
m_AvailableHeaps.erase(*AvailableHeapIt);
if(Allocation.GetCpuHandle().ptr != 0)
break;
}
if(Allocation.GetCpuHandle().ptr == 0)
{
m_HeapDesc.NumDescriptors = std::max(m_HeapDesc.NumDescriptors, static_cast<UINT>(Count));
m_HeapPool.emplace_back( m_MemAllocator, m_pDeviceD3D12Impl, this,
m_HeapPool.size(), m_HeapDesc );
auto NewHeapIt = m_AvailableHeaps.insert(m_HeapPool.size()-1);
Allocation = m_HeapPool[*NewHeapIt.first].Allocate(Count);
}
m_CurrentSize += (Allocation.GetCpuHandle().ptr != 0) ? Count : 0;
m_MaxHeapSize = std::max(m_MaxHeapSize, m_CurrentSize);
return Allocation;
}
For instance, if we request a new allocation with five descriptors, the function will first ask manager [1] to handle this request, but it will fail as it only has maximum two consecutive descriptors. The function will then ask manager [2], which will be able to handle the request:
If after that, we ask to allocate three descriptors, no managers will be able to handle this request and the function will add new manager to the pool and use it to handle the request:
Deallocation routine calls Free()
method of the appropriate allocation manager. Recall that the method is called from the destructor of DescriptorHeapAllocation
. Note that the function uses GetAllocationManagerId()
to retrieve the index of the manager that created this allocation:
void CPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
auto ManagerId = Allocation.GetAllocationManagerId();
m_CurrentSize -= static_cast<Uint32>(Allocation.GetNumHandles());
m_HeapPool[ManagerId].Free(std::move(Allocation));
}
Finally, there is usual method that must be called at the end of the frame to release all stale allocations when it is safe to do so. Note that it is this method that returns the manager to the list of available managers. Only after descriptors have been actually released is it safe to do so.
void CPUDescriptorHeap::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
for (size_t HeapManagerInd = 0; HeapManagerInd < m_HeapPool.size(); ++HeapManagerInd)
{
m_HeapPool[HeapManagerInd].ReleaseStaleAllocations(NumCompletedFrames);
if(m_HeapPool[HeapManagerInd].GetNumAvailableDescriptors() > 0)
m_AvailableHeaps.insert(HeapManagerInd);
}
}
GPU Descriptor Heap
The main goal of the CPU descriptor heap is to provide storage for the resource view descriptors. For GPU to be able to access the descriptors, they must reside in a shade-visible descriptor heap. Only one SRV_CBV_UAV
and one SAMPLER
heap can be bound to the GPU at the same time. Source descriptors may be scattered across several CPU-only descriptor heaps, but must be consolidated in the same SRV_CBV_UAV
or SAMPLER
heap before a draw command can be executed. As a result, GPUDescriptorHeap
object contains only single D3D12 descriptor heap. The space is broken into two parts: the first part is intended to keep rarely changing descriptor handles (corresponding to static and mutable variables). The second part is used to hold dynamic descriptor handles, i.e., temporary handles that live during the current frame only. While the first part is shared between all threads, it would be very inefficient to have the second part organized the same way. Dynamic descriptor handle allocation can potentially be very frequent operation, and if several threads record commands simultaneously, allocating dynamic descriptor handles from the same pool will be a bottleneck. To avoid this problem, dynamic descriptor handle allocation is a two stage process. On the first stage, every command context recording commands allocates a chunk of descriptors from the shared dynamic part of the GPU descriptor heap. This operation requires exclusive access to the GPU heap, but happens infrequently. The second stage is suballoction from that chunk. This part is lock-free and can be done in parallel by every thread. The structure of the GPU heap can then be depicted as shown below:
There are two classes that implement the strategy described above. The GPUDescriptorHeap
manages the two parts of the heap and DynamicSuballocationsManager
handles suballocations within the dynamic part. As we talked above, GPUDescriptorHeap
class contains two descriptor heap allocation managers, one for static allocations, one for dynamic allocations:
DescriptorHeapAllocationManager m_HeapAllocationManager;
DescriptorHeapAllocationManager m_DynamicAllocationsManager;
Note that both these allocation managers are initialized to perform suballocations from the same D3D12 descriptor heap. Also, the first manager is assigned id 0, the second one is assigned id 1. The class provides two methods to allocate from static and dynamic parts of the heap:
DescriptorHeapAllocation GPUDescriptorHeap::Allocate(uint32_t Count)
{
std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
DescriptorHeapAllocation Allocation = m_HeapAllocationManager.Allocate(Count);
return Allocation;
}
DescriptorHeapAllocation GPUDescriptorHeap::AllocateDynamic(uint32_t Count)
{
std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
DescriptorHeapAllocation Allocation = m_DynamicAllocationsManager.Allocate(Count);
return Allocation;
}
There is only one Free()
method as manager id can be used to understand if allocation belongs to the static or dynamic part:
void GPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
auto MgrId = Allocation.GetAllocationManagerId();
if(MgrId == 0)
{
std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
m_HeapAllocationManager.Free(std::move(Allocation));
}
else
{
std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
m_DynamicAllocationsManager.Free(std::move(Allocation));
}
}
Note that all methods lock mutexes to acquire exclusive access to the allocation managers. AllocateDynamic()
method is solely used by the DynamicSuballocationsManager
class to allocate a chunk of heap to perform suballocations from. The class maintains a list of chunks allocated from the main GPU descriptor heap as well as the offset within the current chunk:
std::vector<DescriptorHeapAllocation> m_Suballocations;
Uint32 m_CurrentSuballocationOffset = 0;
During every frame, allocations are performed in a linear fashion. The allocation method fist checks if there is enough space for the requested number of descriptors in the current chunk. If there is not, the method requests a new chunk from the main GPU descriptor heap. The suballocation then happens from the new chunk:
DescriptorHeapAllocation DynamicSuballocationsManager::Allocate(Uint32 Count)
{
if( m_Suballocations.empty() ||
m_CurrentSuballocationOffset + Count > m_Suballocations.back().GetNumHandles() )
{
auto SuballocationSize = std::max(m_DynamicChunkSize, Count);
auto NewDynamicSubAllocation = m_ParentGPUHeap.AllocateDynamic(SuballocationSize);
m_Suballocations.emplace_back(std::move(NewDynamicSubAllocation));
m_CurrentSuballocationOffset = 0;
}
auto &CurrentSuballocation = m_Suballocations.back();
auto ManagerId = CurrentSuballocation.GetAllocationManagerId();
DescriptorHeapAllocation Allocation(
this,
CurrentSuballocation.GetDescriptorHeap(),
CurrentSuballocation.GetCpuHandle(m_CurrentSuballocationOffset),
CurrentSuballocation.GetGpuHandle(m_CurrentSuballocationOffset),
Count,
static_cast<Uint16>(ManagerId) );
m_CurrentSuballocationOffset += Count;
return Allocation;
}
Note that this method is lock-free as every context has its own suballocations manager. The thread may only be blocked when a new chunk is requested from the main GPU descriptor heap, but this is infrequent situation.
Suballocations are not released individually, so DynamicSuballocationsManager::Free()
method does nothing. Instead, all allocations are discarded when command list from this context is recorded and executed by the render device:
void DynamicSuballocationsManager::DiscardAllocations(Uint64 FrameNumber)
{
m_Suballocations.clear();
}
Clearing the vector causes all Descriptor Heap Allocation objects to be destroyed, which in turns calls their destructors. Destructors call GPUDescriptorHeap::Free()
method of the parent GPU descriptor heap, which adds the allocation to the release queue. The allocations are actually released few frames later.
The Big Picture
Now when we presented every individual component, we can describe how they interact with each other and the rest of the system. There are four shared CPU-only descriptor heaps (CBV_SRV_UAV
, SAMPLER
, RTV
and DSV
) implemented by CPUDescriptorHeap
class, and two shader-visible (GPU) descriptor heaps (CBV_SRV_UAV
and SAMPLER
) implemented by GPUDescriptorHeap
class. Every device context that is used for recording commands contains two dynamic suballocation managers (corresponding to two shader-visible descriptor heap types) represented by DynamicSuballocationsManager
class. CPU descriptor heaps are used when a new resource view is created. GPU descriptor heaps are used by the shader resource binding system to allocate storage for shader-visible descriptors. They also used for allocation of dynamic descriptors.
Usage Scenarios
Let's now talk about few scenarios where descriptor heaps are involved.
Creating Resource View
Let's first consider how resource views are created using the example of creating a shader resource view (SRV) of a texture. The process proceeds as follows:
- An allocation containing single descriptor handle is requested from the
CBV_SRV_UAV
CPU-only descriptor heap. Descriptor heap allocation goes as discussed above through the following steps:
- The
CPUDescriptorHeap::Allocate()
method acquires exclusive access to the CPU descriptor heap object - The method iterates over descriptor heap managers that have available descriptor handles and requests one-descriptor allocation
- Since only one descriptor handle is requested, the very first manager will be able to handle the request
- If there are no available managers, new manager (and a new D3D12 descriptor heap) is created to handle the request
- D3D12 render device is used to initialize shader resource view in the allocated descriptor (see ID3D12Device::CreateShaderResourceView on MSDN)
- Descriptor Heap Allocation object is kept as part of the resource view object and is destroyed when resource view object is released. At this point:
- Destructor of the Descriptor Heap Allocation object calls
CPUDescriptorHeap::Free()
that locks the heap and calls DescriptorHeapAllocationManager::Free()
method of the allocation manager that created the allocation - The manager inserts allocation attributes (offset and size) along with the frame number into the deletion queue
- Few frames later when frame completion fence is signaled, the allocation is actually released by
CPUDescriptorHeap::ReleaseStaleAllocations()
method
Creating all types of texture views (SRV, RTV, DSV and UAV) as well as all types of buffer views is done in the same way.
Allocating Dynamic Descriptor
Let's now recap how dynamic descriptors are allocated:
- The context which needs dynamic descriptor uses one of its two dynamic suballocation managers (
CBV_SRV_UAV
or SAMPLER
) to request the desired type of descriptor handle
- The suballocation manager checks if the last chunk contains enough space to suffice the allocation request. In most situations, that will be the case and the descriptor handles will be suballocated from this chunk
- If there is no enough space, the suballocation manager reuquests the main GPU descriptor heap to allocate new chunk of descriptor handles. The handles are then suballocated from the new chunk
- At the end of the frame, the suballocation manager disposes all chunks which go back to the GPU descriptor heap
- The GPU descriptor heap inserts all chunks along with the frame number into the release queue
- Few frames later when frame completion fence is signaled, the chunks are actually released and the space becomes available for new allocations
Shader Resource Binding
Diligent Engine uses shader resource binding model that includes three types of shader resources based on the frequency of change (static, mutable and dynamic) as well as shader resource binding object. When new shader resource binding object is created, it allocates space in the GPU descriptor heap for its mutable and static resources. The allocation is kept by the shader resource binding object and is released when the owning object is destroyed. This topic will be discussed in details in a separate post.
Multithreading and GPU-Safety Concerns
The descriptor heap management system is correct, safe and efficient in a multithreaded environment. All three types of allocations (CPU descriptor, static/mutable GPU descriptor and dynamic GPU descriptor) proceed through thread-safe paths. CPU and static/mutable descriptor allocation functions (CPUDescriptorHeap::Allocate()
, GPUDescriptorHeap::Allocate()
) acquire exclusive access to descriptor heap objects and potentially may block other threads. However, descriptor allocation is fast and constitutes only a tiny portion of work associated with resource creation, so this is not a problem. Dynamic descriptor heap allocation (DynamicSuballocationsManager::Allocate()
) is free-threaded, so can be called in parallel by many threads with no performance cost (the same context should not be used by different threads simultaneously). The only blocking function is GPUDescriptorHeap::AllocateDynamic()
, but it is only called occasionally.
Deallocation is more complicated as besides CPU-side safety the system must also make sure that descriptors are not used by the GPU. CPU-side safety is achieved by protecting the deallocation methods (CPUDescriptorHeap::Free()
, and GPUDescriptorHeap::Free()
) with mutexes. GPU-side safety is assured by recording the command list number when the allocation is destroyed. For CPU and static/mutable GPU descriptors, it does not matter which thread releases the allocation. As long as there are no more references, the allocation can never be used again in any new GPU command, but it may be referenced by the commands pending execution by GPU. So at the moment when allocation is released, it is added by the deleting thread into the deletion queue along with the current command list number. Deletion queues are purged once at the end of each frame by the render device. The device knows how many command lists have actually been completed by the GPU and can release all allocations that are referenced by completed commands.
For dynamic descriptors, deallocation happens when command list from the context is closed and executed. It does not matter which thread recorded the list. As long as it has been sent to the command queue for execution (from any thread), all dynamic descriptors are stale and can be discarded. So the context returns all chunks back to the GPU descriptor heap object, which adds them to the release queue. For a deferred context that means that until it is executed, all dynamic descriptors are unavailable for use by other contexts.
Discussion
In the current implementation, same CPU descriptor heap objects are used to allocate resource view descriptor handles on all threads. We did not notice this to be a problem as descriptor heap allocation/deallocation is very fast unless new CPU descriptor heap needs to be created. This however should not be a problem as the descriptor heap manager size can be specified at the initialization time to furnish the applications demands. The system provides methods to query the maximum size that every heap achieved during the application run time.
Careful reader may have noticed that GPUDescriptorHeap
class uses generic DescriptorHeapAllocationManager
to allocate dynamic chunks of equal sizes. The only situation when the chunk size may be different is when the number of requested descriptors is larger than the default chunk size. This however a very untypical situation, so a more efficient fixed-size block allocator may be used instead of the variable-size allocations manager.
Diligent Engine currently supports only single GPU descriptor heap of each type (CBV_SRV_UAV
and SAMPLER
). While the first heap can contain large number of descriptor handles (1,000,000+), sampler heap size is limited to 2048 descriptors, which can potentially lead to heap exhaustion. However, in most cases, the type of the sampler in the shader is known in advance and never changes. D3D12 introduced a concept of static samplers to handle such cases, which is also exposed by Diligent Engine. Static samplers should be used whenever possible, and the number of static samplers is unlimited. So the sampler descriptor heap will be used only to keep descriptor handles of samplers that change at run-time, which is less typical situation.
Related Articles