Table of Contents
Intro
In the previous article, you’ve seen a conceptual implementation of the async/await State Machine. Although I used some simplifications, it was very close to the actual one.
This means you now have a very deep understanding of the whole async/await machinery.
In this post, I will present the actual implementation by focusing on the performance optimizations I found very intriguing.
This is the kind of knowledge that you don’t need on a daily basis but truly get your skills to the next level.
The details you’ll see here will make you appreciate all the efforts and reasoning behind a real-world implementation used by millions of users.
Here are some of the points you will explore:
- How and why the State Machine is kept on the execution stack if all the awaiters are completed?
- How is the State Machine boxed onto the heap before the first pause in order to preserve its state?
- The role of AsyncTaskMethodBuilder<T> as a coordinator between the State Machine and the async infrastructure.
- What is the ExecutionContext, and how it flows across continuations?
Let’s dive in!
This article’s content is influenced by the async/await material in the book C# in Depth by Jon Skeet and the blog post Dissecting the async methods in C# by Sergey Tepliakov.
For additional in-depth materials on async/await in C#, you can check this Pluralsight course.
Starter Code
Not surprisingly, I’ll use the same sample code as in the last two posts:
public static class MyClass { public static async Task<int> MyAsyncMethod(int firstDelay, int secondDelay) { Console.WriteLine("Before first await."); await Task.Delay(firstDelay); Console.WriteLine("Before second await."); await Task.Delay(secondDelay); Console.WriteLine("Done."); return 42; } }
The Workflow Diagram
At the risk of being annoying, here’s also the workflow diagram you saw in Part 2, so you can use it for reference.
Implementation
I’ll start by giving you the full source code first and move to some concrete explanations afterward.
Please spend a minute comparing the code below to the conceptual implementation from the last article. If you’ve been following along, almost everything should make sense.
Some of the more mystical bits may be the usage of AsyncTaskMethodBuilder<int>
and the SetStateMachine
method. I will explore those two in the upcoming sections.
Please note that the State Machine (named <MyAsyncMethod>d__0
) is a struct, not a class (at least in Release builds). This is also true for TaskMethodBuilder<T>.
This is important as it allows for the memory optimizations you’ll see later.
AsyncTaskMethodBuilder<TResult> – Overview
One of the main differences with the Conceptual Implementation is the usage of the AsyncTaskMethodBuilder<int>
struct:
private struct <MyAsyncMethod>d__0 : IAsyncStateMachine { // … public AsyncTaskMethodBuilder<int> <>t__builder; // … }
Recall that for building the resulting task, in the previous post, we used a TaskCompletionSource instance.
That was an oversimplification compared to the real implementation. AsyncTaskMethodBuilder
does a lot more housekeeping and coordinates the communication between the State Machine and the async infrastructure.
Here are the primary responsibilities of the Task Method Builder:
- Produces the resulting task
- Starts the State Machine
- Attaches the continuation
- Boxes the State Machine onto the heap
- Flows the ExecutionContext
Let’s start digging into some of these.
“MyAsyncMethod” and Starting the State Machine
As already explained, MyAsyncMethod
gets transformed by the compiler so that its new purpose is to trigger the State Machine workflow and return the resulting task (which in most cases will not be completed):
[AsyncStateMachine(typeof(<MyAsyncMethod>d__0))] public static Task<int> MyAsyncMethod(int firstDelay, int secondDelay) { <MyAsyncMethod>d__0 stateMachine = default(<MyAsyncMethod>d__0); stateMachine.<>t__builder = AsyncTaskMethodBuilder<int>.Create(); stateMachine.firstDelay = firstDelay; stateMachine.secondDelay = secondDelay; stateMachine.<>1__state = -1; stateMachine.<>t__builder.Start(ref stateMachine); return stateMachine.<>t__builder.Task; }
Notice that the State Machine is passed by reference to the Start
method of the Builder:
stateMachine.<>t__builder.Start(ref stateMachine);
This is done for efficiency and consistency – we avoid creating a copy of the State Machine, which can be expensive. Also, any changes to the State Machine within the Start method will affect the original State Machine instance.
As you can see here, the Builder’s Start
method, at some point, will invoke the MoveNext
method of State Machine.
Keeping the State Machine on the Stack
Let’s summarize the program flow when starting the State Machine.
MyAsyncMethod
creates the State Machine and theAsyncTaskMethodBuilder
.- Then it invokes the
AsyncTaskMethodBuilder.Start
method passing the State Machine by reference. - The Start method then does some housekeeping and invokes the
MoveNext
method of the State Machine.
Notice that, so far, both the State Machine and the Method Builder (being structs) live on the stack.
Also, recall that if all the awaiters are already completed, MoveNext
will execute synchronously. In those cases, the State Machine (and the Builder) will stay on the stack during the full execution. This means no work for the Garbage Collector and optimal memory footprint.
However, the main idea of async programming is to pause, offload the current thread, and continue later (probably on some other thread).
In these cases, we need to preserve the State Machine by boxing it onto the heap and attach its MoveNext
method as a continuation.
This is done by the AsyncTaskMethodBuilder
via the AwaitUnsafeOnCompleted
method:
if (!awaiter.IsCompleted) { // … <>t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this); return; }
Let’s see the details.
Boxing the State Machine and the IAsyncStateMachine.SetStateMachine Method
You may be thinking the SetStateMachine
method looks rather weird, and its’ purpose may be unclean.
void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine stateMachine) { <>t__builder.SetStateMachine(stateMachine); }
That’s perfectly reasonable. The truth is, it’s just part of the boxing machinery.
The implementation surrounding the boxing logic contains a lot of plumbing code. That’s why I decided to build a somewhat simplified version.
Please spend a few minutes following the implementation below, starting from the AwaitUnsafeOnCompleted
method call.
If you find the code above a little overwhelming, that’s perfectly normal. It’s probably one of the most obscure areas of the async/await machinery.
Let’s focus on the exact piece where the boxing happens.
The Boxing in Essence
The most intuition comes from the following piece:
IAsyncStateMachine boxed = stateMachine; boxed.SetStateMachine(boxed); _moveNextRunner = () => boxed.MoveNext();
If you feel uncertain about why the statement on Line 1 boxes the State Machine, Jon Skeet gives a simple explanation in this SO thread.
Here is a high-level description of this code block:
- It boxes the State Machine by assigning it to the
IAsyncStateMachine
interface (which is a reference type, hence the boxing). - It calls
SetStateMachine
on the boxed instance. This will, in turn, call theSetStateMachine
on the Method Builder so that it will hold a reference to the boxed State Machine instance. - The continuation (
_moveNextRunner
) is assigned with theMoveNext
method of the boxed instance.
The “Unsafe” in AwaitUnsafeOnCompleted and the ICriticalNotifyCompletion Interface
You might be wondering what’s “unsafe” about the Method Builder’s AwaitUnsafeOnCompleted method. Moreover, there is also an AwaitOnCompleted version.
So, what’s the difference between AwaitOnCompleted
and AwaitUnsafeOnCompleted
?
It comes down to the TaskAwaiter
and how we pass a continuation to it.
In the first article I presented the Awaitable Pattern. I described the OnCompleted(Action continuation) method in TaskAwaiter that comes from the INotifyCompletion
interface.
In fact, TaskAwaiter
implements the ICriticalNotifyCompletion
interface(*) that declares the UnsafeOnCompleted(Action continuation)
method.
(*) ICriticalNotifyCompletion implements INotifyCompletion, so the TaskAwaiter class ends with both of the methods – OnCompleted and UnsafeOnCompleted.
Back to the original question – why do we need the “unsafe” version?
In the context of this article, you should expect what the answer is – performance.
Before digging into the details, you’ll need a high-level understanding of ExecutionContext.
What is “ExecutionContext”
I will not spend a lot of time describing what the ExecutionContext is, so here’s an excellent brief description by Sergey Tepliakov:
One may wonder: what is the execution context and why we need all that complexity?
In the synchronous world, each thread keeps ambient information in a thread-local storage. It can be security-related information, culture-specific data, or something else. When 3 methods are called sequentially in one thread this information flows naturally between all of them. But this is no longer true for asynchronous methods. Each “section” of an asynchronous method can be executed in different threads that makes thread-local information unusable.
Execution context keeps the information for one logical flow of control even when it spans multiple threads.
Another pretty good article that explains the differences between the ExecutionContext
and SynchronizationContext
is this one by Stephen Toub.
“Flowing” the ExecutionContext
Back to our discussion about the safe and unsafe methods.
In essence, TaskAwaiter.OnCompleted flows the ExecutionContext, while TaskAwaiter.UnsafeOnCompleted doesn’t:
UnsafeOnCompleted
is meant to be called only by the trusted async infrastructure, like the AsyncTaskMethodBuilder
class. AsyncTaskMethodBuilder
guarantees that it always captures the execution context. That’s why it calls the unsafe method on the TaskAwaiter
to avoid capturing it twice.
This is also explained by Stephen Toub like so:
ExecutionContext always flows across awaits; that’s handled by the async method builder. Thus, having the awaiter do it as well would be unnecessary duplication, and so UnsafeOnCompleted is preferred. The awaiter APIs can be called by anyone, and they were introduced at a time when we still believed in the code-access security model, and so we wanted OnCompleted to always be available (which would flow ExecutionContext if called directly) and then there was the SecurityCritical UnsafeOnCompleted that the compiler could use in async methods.
Summary
In this article, you’ve seen the real async/await State Machine implementation and the various optimizations the compiler performs.
You’ve learned some advanced techniques that you most probably don’t need to know in order to handle your daily programming tasks.
However, I find such in-depth explorations as quite valuable. The takeaways are far more overreaching than just understanding some limited piece of functionality. It’s not only a lot of fun, but it takes you further on the path of becoming an expert in your programming ecosystem.
In the next post, I’ll continue with some more practical implications of the async/await machinery. Concretely, I will focus on the SynchronizationContext in nested async calls.
This will help you make your code more robust and avoid some common pitfalls.
Stay tuned, and thanks for reading!