The Task Scheduler schedules and coordinates tasks at run time. A task is a unit of work that performs a specific job. The Task Scheduler manages the details that are related to efficiently scheduling tasks on computers that have multiple computing resources.
Windows OS provides a preemptive kernel-mode scheduler, it’s a round-robin, priority-based mechanism that gives every task exclusive access to a computing resource for a given time period, and then switches to another task. Although this mechanism provides fairness (every thread makes forward progress), it comes at some cost of efficiency. For example, many computation-intensive algorithms do not require fairness. Instead, it is important that related tasks finish in the least overall time. Cooperative scheduling enables an application to more efficiently schedule work.
Cooperative scheduling is a mechanism that gives every task exclusive access to a computing resource until the task finishes or until the task yields its access to the resource.
The user-mode cooperative scheduler enables application code to make its own scheduling decisions. Because cooperative scheduling enables many scheduling decisions to be made by the application, it reduces much of the overhead that is associated with kernel-mode synchronization.
The Concurrency Runtime uses cooperative scheduling together with the preemptive scheduler of the operating system to achieve maximum usage of processing resources.
In this article, we will discuss the Task Scheduler design and examine how it works internally. To do so, CppDepend is used to analyze the concurrency runtime code source.
Scheduler Design
The concurrency runtime provides the interface Scheduler
to implement a specific scheduler adapted to application needs.
Let’s discover classes implementing this interface:
The concurrency runtime provides two implementations of the scheduler, ThreadScheduler
and UMSThreadScheduler
, and as shown in the dependency graph the SchedulerBase
contains all common behavior of these two classes.
Is the Scheduler flexible? A good indicator of flexibility is to search for all abstract
classes used by the scheduler.
As shown in the dependency graph, the Scheduler uses many abstract
classes, it enforces low coupling, and makes the scheduler more flexible, so adapting it to other needs will be easy.
To explain the role of each abstract
class used by the scheduler, we will discuss its responsibilities.
There are three major responsibilities assigned to the Task Scheduler:
1. Getting Resources (Processors, Cores, Memory)
When the scheduler is created, it asks for resources from runtime resource manager as explained in this article.
The scheduler communicates with the resource manager using IResourceManager
, ISchedulerProxy
and IScheduler interface
s.
Resources given by resource manager use scheduler policy to allocate resources to the scheduler.
The policy as shown in this dependency graph is assigned when the scheduler is created.
The concurrency runtime creates a default scheduler if no scheduler exists by invoking GetDefaultScheduler
method, and a default policy is used.
The Task Scheduler enables applications to use one or more scheduler instances to schedule work, and an application can invoke Scheduler::Create
to add another scheduler that uses a specific policy.
The Concurrency::PolicyElementKey
enumeration defines the policy keys that are associated with the Task Scheduler.
Here’s an article explaining the utility of each policy key, and the default value of each key.
The following collaborations between the scheduler and resource manager shows the role of each interface
concerned by the allocation.
Ask for resource allocation:
Getting resources from resource manager:
2) Manage Task Queues
When Scheduler is created, tasks could be assigned to it to be executed, the Scheduler store tasks into queues. To enforce the cohesion of classes, the queues are not managed directly by the ThreadScheduler
class but by the ScheduleGroupBase
class.
A schedule group affinitizes, or groups, related tasks together. Every scheduler has one or more schedule groups. Use schedule groups when you require a high degree of locality among tasks, for example, when a group of related tasks benefit from executing on the same processor node.
As shown in the following graph, the runtime provides two kind of ScheduleGroup
s: FairScheduleGroup
and CacheLocalScheduleGroup
, choosing between those two groups as it will be explained later impact the algorithm of choosing the next task to execute by the scheduler.
Every Scheduler has a default schedule group for every scheduling node.The runtime creates one scheduling node for every processor package or Non-Uniform Memory Architecture (NUMA) node. If you do not explicitly associate a task with a schedule group, the scheduler chooses which group to add the task to.
As shown in the following dependency graph, the schedulingRing
is responsible for managing Scheduler Groups, it contains a list of groups and creates them.
The Schedule group contains three kind of queues:
1. FIFO Queue
This queue contains lightweight tasks, a lightweight task resembles the function that you provide to the Windows API CreateThread
function. Therefore, lightweight tasks are useful when you adapt existing code to use the scheduling functionality of the Concurrency Runtime.
A lightweight task is represented by RealizedChore
class, and the FIFO queue of the schedule group is represented by the field m_realizedChores
.
Let’s search all methods using directly this queue:
So we can add a lightweight task to the group by invoking ScheduleGroupBase::ScheduleTask
or Scheduler::ScheduleTask
.
Here’s an interesting article talking about lightweight tasks.
2. Work Stealing Queue
There’s only one FIFO queue associated to the schedule group, but the schedule group references a list of work sealing queues, for each worker thread, there’s a local queue associated to it.
A thread that is attached to a scheduler is known as an execution context, or just context, so actually this local queue is associated to Context
class.
The Context
class provides a programming abstraction for an execution context and offers the ability to cooperatively block, unblock, and yield the current context.
And to be sure that only Context
creates this kind of queue, let’s search of method accessing directly to m_workQueues
field.
So the context is responsible for creating this queue, and for each context, there’s a local work stealing queue associated to it.
To illustrate the behavior of the work-stealing algorithm, let’s imagine that we have two worker threads allocated to the scheduler.
As explained before, for each worker thread, there’s a local queue associated with it.
Imagine that we have three tasks in the queue of the Worker Thread 1, tasks 3 and 4 are waiting to be executed while the 5 is running.
The Dispatch
method finds that there is nothing in the queue, so the task 3 is moved, or “stolen” from its original queue, to be distributed the worker thread available.
How we can create a task managed by work stealing queue?
To answer this question, let’s search for methods indirectly invoking the CreateWorkQueue
method.
As shown in this dependency graph, this kind of task could be created by using task_group
class.
Using task_group
to add new task is better than using Scheduler::ScheduleTask
to create lightweight task, indeed the work stealing algorithm optimizes better the using of virtual processors allocated to the scheduler.
However ScheduleTask
could be better to migrate easily from the existing code using CreateThread
API.
3. Unblocked Context Queue
The Context
class lets you block or yield the current execution context. Blocking or yielding is useful when the current context cannot continue because a resource is not available.
The Context::Block
method blocks the current context. A context that is blocked yields its processing resources so that the runtime can perform other tasks. The Context::Unblock
method unblocks a blocked context.
When a context is unblocked and it’s available to be executed, it’s added to runnable context queue, this queue is represented by the field m_runnableContexts
.
Here’s a dependency graph showing some cases where the context is added to runnable queue:
So the context is added to the queue when it’s unblocked or a virtual processor is retired from the scheduler.
3. Dispatching Tasks
The scheduler tries to search for work to execute, a work could be:
- Unblocked context
- Lightweight task
- Task in work stealing queues
And as explained before, all these works are stored in queues managed by Schedule groups, and each group is managed by a scheduling ring.
When a virtual processor is allocated to the scheduler, a ThreadProxy
class is created and associated to this processor and after the creation the Dispatch
method of the ThreadProxy
is invoked.
As shown in the following dependency graph and as explained before, the concurrency runtime uses abstract
classes to enforces low coupling, and the real dispatch invoked depends on the implementation chosen by the runtime, this choice is given by the scheduler policy.
The concrete implementation of Dispatch
invokes Dispatch Method of the execution context.
Here’s a dependant graph showing methods invoked by concrete implementation of Context::Dispatch
method:
So the algorithm of searching the next work to execute is implemented by WorkSearchContext
class.
Let’s discover all classes used directly by WorkSearchContext
to achieve its responsibility:
The responsibility of WorkSearchContext
is to give us a WorkItem
to execute, it could be InternalContextBase
, RealizedChore
or _UnrealizedChore
.
To understand better the collaboration between these classes, let’s search for methods used directly by WorkSearchContext
:
So the WorkSearchContext
iterates on SchedulingRing
and ScheduleGroup
classes by using SchedulerBase
methods.
And for each ScheduleBase
, we search for RunnableContext
, realizedChore
or UnrealizedChore
.
The WorkSearchContext
class is created by the VirtualProcessor
class, and as shown in the following dependency graph, the algorithm used is specified when the VirtualProcessor
is initialized, and for that it asks the Scheduler for the SchedulingProtocol
which describes the scheduling algorithm that will be utilized for the scheduler.
The WorkSearchContext
will be notified by the algorithm to use by passing it a value from Algorithm enum
.
So two algorithms are implemented by this class to find a work:
Cache Local Algorithm
This algorithm will look for runnables context within the current schedule group, then realized chores and unrealized chores, if there’s no more work in the current schedule group, it looks for the next group in the same schedule ring, and when it finishes all work in the current schedule ring, it looks in the next schedule ring.
So the scheduler prefers to continue to work on tasks within the current schedule group before moving to another schedule group.
This algorithm is implemented by WorkSearchContext::SearchCacheLocal
method, and as show by this dependency graph, this method search invokes other methods to search for runnable contexts, RealizedChore
or _UnrealizedChore
.
Another specificity of this algorithm is that the unblocked contexts are cached per virtual-processor and are typically scheduled in a last-in first-out (LIFO) fashion by the virtual processor which unblocked them.
And to verify this behavior, here’s a dependency graph of methods invoked when searching for runnable context:
This algorithm is the default algorithm chosen by the scheduler if no one is specified.
Fair Algorithm
In this case, the scheduler prefers to round-robin through schedule groups after executing each task. Unblocked contexts are typically scheduled in a first-in-first-out (FIFO) fashion. Virtual processors do not cache unblocked contexts.
This algorithm is implemented by WorkSearchContext::SearchFair
method, and as show by this dependency graph, this method search invoke other methods to search for runnable contexts, RealizedChore
or _UnrealizedChore
.
After this analysis, we can say that the concurrent runtime task scheduler is well designed, it’s characterized by a high cohesion and very low coupling, it’s very flexible and provides by default an optimized algorithm to exploit better hardware resources.
Filed under: CodeProject