Table of Contents

Intro

In the previous article, you’ve seen a conceptual implementation of the async/await State Machine. Although I used some simplifications, it was very close to the actual one.

This means you now have a very deep understanding of the whole async/await machinery.

In this post, I will present the actual implementation by focusing on the performance optimizations I found very intriguing.

This is the kind of knowledge that you don’t need on a daily basis but truly get your skills to the next level.

The details you’ll see here will make you appreciate all the efforts and reasoning behind a real-world implementation used by millions of users.

Here are some of the points you will explore:

How and why the State Machine is kept on the execution stack if all the awaiters are completed?
How is the State Machine boxed onto the heap before the first pause in order to preserve its state?
The role of AsyncTaskMethodBuilder<T> as a coordinator between the State Machine and the async infrastructure.
What is the ExecutionContext, and how it flows across continuations?

Let’s dive in!

This article’s content is influenced by the async/await material in the book C# in Depth by Jon Skeet and the blog post Dissecting the async methods in C# by Sergey Tepliakov.

C# in Depth: Fourth Edition

C# in Depth: Fourth Edition [Skeet, Jon] on Amazon.com. *FREE* shipping on qualifying offers. C# in Depth: Fourth Edition

For additional in-depth materials on async/await in C#, you can check this Pluralsight course.

Applying Asynchronous Programming in C#

This course will teach you how to get started with asynchronous programming in .NET. You will learn how to apply these patterns in new and existing applications and you will see how to avoid the common mistakes.

Starter Code

Not surprisingly, I’ll use the same sample code as in the last two posts:

public static class MyClass
{
    public static async Task<int> MyAsyncMethod(int firstDelay, int secondDelay)
    {
        Console.WriteLine("Before first await.");
        
        await Task.Delay(firstDelay);
        
        Console.WriteLine("Before second await.");
        
        await Task.Delay(secondDelay);
        
        Console.WriteLine("Done.");

        return 42;
    }
}

The Workflow Diagram

At the risk of being annoying, here’s also the workflow diagram you saw in Part 2, so you can use it for reference.

Figure 2: State Machine Workflow Diagram

Implementation

I’ll start by giving you the full source code first and move to some concrete explanations afterward.

Please spend a minute comparing the code below to the conceptual implementation from the last article. If you’ve been following along, almost everything should make sense.

Some of the more mystical bits may be the usage of AsyncTaskMethodBuilder<int> and the SetStateMachine method. I will explore those two in the upcoming sections.

Please note that the State Machine (named <MyAsyncMethod>d__0) is a struct, not a class (at least in Release builds). This is also true for TaskMethodBuilder<T>.

This is important as it allows for the memory optimizations you’ll see later.

AsyncTaskMethodBuilder<TResult> – Overview

One of the main differences with the Conceptual Implementation is the usage of the AsyncTaskMethodBuilder<int> struct:

private struct <MyAsyncMethod>d__0 : IAsyncStateMachine
{
   // …

   public AsyncTaskMethodBuilder<int> <>t__builder;

   // …
}

Recall that for building the resulting task, in the previous post, we used a TaskCompletionSource instance.

That was an oversimplification compared to the real implementation. AsyncTaskMethodBuilder does a lot more housekeeping and coordinates the communication between the State Machine and the async infrastructure.

Here are the primary responsibilities of the Task Method Builder:

Produces the resulting task
Starts the State Machine
Attaches the continuation
Boxes the State Machine onto the heap
Flows the ExecutionContext

Let’s start digging into some of these.

“MyAsyncMethod” and Starting the State Machine

As already explained, MyAsyncMethod gets transformed by the compiler so that its new purpose is to trigger the State Machine workflow and return the resulting task (which in most cases will not be completed):

[AsyncStateMachine(typeof(<MyAsyncMethod>d__0))]
public static Task<int> MyAsyncMethod(int firstDelay, int secondDelay)
{
   <MyAsyncMethod>d__0 stateMachine = default(<MyAsyncMethod>d__0);
   stateMachine.<>t__builder = AsyncTaskMethodBuilder<int>.Create();
   stateMachine.firstDelay = firstDelay;
   stateMachine.secondDelay = secondDelay;
   stateMachine.<>1__state = -1;
   stateMachine.<>t__builder.Start(ref stateMachine);
   return stateMachine.<>t__builder.Task;
}

Notice that the State Machine is passed by reference to the Start method of the Builder:

stateMachine.<>t__builder.Start(ref stateMachine);

This is done for efficiency and consistency – we avoid creating a copy of the State Machine, which can be expensive. Also, any changes to the State Machine within the Start method will affect the original State Machine instance.

As you can see here, the Builder’s Start method, at some point, will invoke the MoveNext method of State Machine.

Keeping the State Machine on the Stack

Let’s summarize the program flow when starting the State Machine.

MyAsyncMethod creates the State Machine and the AsyncTaskMethodBuilder.
Then it invokes the AsyncTaskMethodBuilder.Start method passing the State Machine by reference.
The Start method then does some housekeeping and invokes the MoveNext method of the State Machine.

Notice that, so far, both the State Machine and the Method Builder (being structs) live on the stack.

Also, recall that if all the awaiters are already completed, MoveNext will execute synchronously. In those cases, the State Machine (and the Builder) will stay on the stack during the full execution. This means no work for the Garbage Collector and optimal memory footprint.

However, the main idea of async programming is to pause, offload the current thread, and continue later (probably on some other thread).

In these cases, we need to preserve the State Machine by boxing it onto the heap and attach its MoveNext method as a continuation.

This is done by the AsyncTaskMethodBuilder via the AwaitUnsafeOnCompleted method:

if (!awaiter.IsCompleted)
{
   // …
   <>t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this);
   return;
}

Let’s see the details.

Boxing the State Machine and the IAsyncStateMachine.SetStateMachine Method

You may be thinking the SetStateMachine method looks rather weird, and its’ purpose may be unclean.

void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine stateMachine)
{
   <>t__builder.SetStateMachine(stateMachine);
}

That’s perfectly reasonable. The truth is, it’s just part of the boxing machinery.

The implementation surrounding the boxing logic contains a lot of plumbing code. That’s why I decided to build a somewhat simplified version.

Please spend a few minutes following the implementation below, starting from the AwaitUnsafeOnCompleted method call.

If you find the code above a little overwhelming, that’s perfectly normal. It’s probably one of the most obscure areas of the async/await machinery.

Let’s focus on the exact piece where the boxing happens.

The Boxing in Essence

The most intuition comes from the following piece:

IAsyncStateMachine boxed = stateMachine;
boxed.SetStateMachine(boxed);
_moveNextRunner = () => boxed.MoveNext();

If you feel uncertain about why the statement on Line 1 boxes the State Machine, Jon Skeet gives a simple explanation in this SO thread.

Here is a high-level description of this code block:

It boxes the State Machine by assigning it to the IAsyncStateMachine interface (which is a reference type, hence the boxing).
It calls SetStateMachine on the boxed instance. This will, in turn, call the SetStateMachine on the Method Builder so that it will hold a reference to the boxed State Machine instance.
The continuation (_moveNextRunner) is assigned with the MoveNext method of the boxed instance.

The “Unsafe” in AwaitUnsafeOnCompleted and the ICriticalNotifyCompletion Interface

You might be wondering what’s “unsafe” about the Method Builder’s AwaitUnsafeOnCompleted method. Moreover, there is also an AwaitOnCompleted version.

So, what’s the difference between AwaitOnCompleted and AwaitUnsafeOnCompleted?

It comes down to the TaskAwaiter and how we pass a continuation to it.

In the first article I presented the Awaitable Pattern. I described the OnCompleted(Action continuation) method in TaskAwaiter that comes from the INotifyCompletion interface.

In fact, TaskAwaiter implements the ICriticalNotifyCompletion interface(*) that declares the UnsafeOnCompleted(Action continuation) method.

(*) ICriticalNotifyCompletion implements INotifyCompletion, so the TaskAwaiter class ends with both of the methods – OnCompleted and UnsafeOnCompleted.

Back to the original question – why do we need the “unsafe” version?

In the context of this article, you should expect what the answer is – performance.

Before digging into the details, you’ll need a high-level understanding of ExecutionContext.

What is “ExecutionContext”

I will not spend a lot of time describing what the ExecutionContext is, so here’s an excellent brief description by Sergey Tepliakov:

One may wonder: what is the execution context and why we need all that complexity?

In the synchronous world, each thread keeps ambient information in a thread-local storage. It can be security-related information, culture-specific data, or something else. When 3 methods are called sequentially in one thread this information flows naturally between all of them. But this is no longer true for asynchronous methods. Each “section” of an asynchronous method can be executed in different threads that makes thread-local information unusable.

Execution context keeps the information for one logical flow of control even when it spans multiple threads.

Another pretty good article that explains the differences between the ExecutionContext and SynchronizationContext is this one by Stephen Toub.

“Flowing” the ExecutionContext

Back to our discussion about the safe and unsafe methods.

In essence, TaskAwaiter.OnCompleted flows the ExecutionContext, while TaskAwaiter.UnsafeOnCompleted doesn’t:

UnsafeOnCompleted is meant to be called only by the trusted async infrastructure, like the AsyncTaskMethodBuilder class. AsyncTaskMethodBuilder guarantees that it always captures the execution context. That’s why it calls the unsafe method on the TaskAwaiter to avoid capturing it twice.

This is also explained by Stephen Toub like so:

ExecutionContext always flows across awaits; that’s handled by the async method builder. Thus, having the awaiter do it as well would be unnecessary duplication, and so UnsafeOnCompleted is preferred. The awaiter APIs can be called by anyone, and they were introduced at a time when we still believed in the code-access security model, and so we wanted OnCompleted to always be available (which would flow ExecutionContext if called directly) and then there was the SecurityCritical UnsafeOnCompleted that the compiler could use in async methods.

Summary

In this article, you’ve seen the real async/await State Machine implementation and the various optimizations the compiler performs.

You’ve learned some advanced techniques that you most probably don’t need to know in order to handle your daily programming tasks.

However, I find such in-depth explorations as quite valuable. The takeaways are far more overreaching than just understanding some limited piece of functionality. It’s not only a lot of fun, but it takes you further on the path of becoming an expert in your programming ecosystem.

In the next post, I’ll continue with some more practical implications of the async/await machinery. Concretely, I will focus on the SynchronizationContext in nested async calls.

This will help you make your code more robust and avoid some common pitfalls.

Stay tuned, and thanks for reading!

Exploring the async/await State Machine – Concrete Implementation

Intro

Starter Code

The Workflow Diagram

Implementation

AsyncTaskMethodBuilder<TResult> – Overview

“MyAsyncMethod” and Starting the State Machine

Keeping the State Machine on the Stack

Boxing the State Machine and the IAsyncStateMachine.SetStateMachine Method

The Boxing in Essence

The “Unsafe” in AwaitUnsafeOnCompleted and the ICriticalNotifyCompletion Interface

What is “ExecutionContext”

“Flowing” the ExecutionContext

Summary

Resources

Vasil Kosturski

Exploring the async/await State Machine – Conceptual Implementation

Exploring the async/await State Machine – Series Overview

Related Posts:

TensorFlow Serving gRPC Endpoint in Docker with a .NET 5 Client

Clash of Styles, Part #4 – Adding Support for Rational Numbers with FP

Clash of Styles, Part #1 – Operations Matrix via OOP

Subscribe To My Newsletter

Exploring the async/await State Machine – Concrete Implementation

Intro

Starter Code

The Workflow Diagram

Implementation

AsyncTaskMethodBuilder<TResult> – Overview

“MyAsyncMethod” and Starting the State Machine

Keeping the State Machine on the Stack

Boxing the State Machine and the IAsyncStateMachine.SetStateMachine Method

The Boxing in Essence

The “Unsafe” in AwaitUnsafeOnCompleted and the ICriticalNotifyCompletion Interface

What is “ExecutionContext”

“Flowing” the ExecutionContext

Summary

Resources

Share this:

Vasil Kosturski

Post Navigation

Exploring the async/await State Machine – Conceptual Implementation

Exploring the async/await State Machine – Series Overview

Related Posts:

TensorFlow Serving gRPC Endpoint in Docker with a .NET 5 Client

Clash of Styles, Part #4 – Adding Support for Rational Numbers with FP

Clash of Styles, Part #1 – Operations Matrix via OOP

Subscribe To My Newsletter