If you’re not careful what and when you invoke from within a critical section, you may cause a deadlock. Everything may seem correct from yours’ and your clients’ perspective, but out of a sudden, the program may go to sleep forever. In this article, I’ll demonstrate some scenarios when this can happen and what you can do(or even should do) to mitigate the risk.
But first, let’s do a quick refresher on the lock
statement in C# and why we use it.
The easiest way to read and write to some shared data from multiple threads and make sure it doesn’t get corrupted is to use some kind of a lock. There are various types of locks in .NET Framework like Monitor
, SpinLock
, ReaderWriterLockSlim
. The most straightforward construct is the lock
statement(which is just a simple wrapper around the Monitor
), and we can safely say it works well in the majority of the cases. All the examples in this article will be based on it. But the conclusions would be valid for the other types of locks as well.
Please note this article is not about strategies and patterns for parallel programming. In many cases, you may avoid locking altogether, or you may use some more advanced or hardware friendly primitives to achieve better performance. I’ll be using the lock
construct mostly because of its’ simplicity and popularity.
By using the lock
statement, we just say that a piece of code can be executed one thread at a time:
lock (_lock) { // Critical Section - Only a single thread can run here at a time // Read & Write to the shared data }
If you like to dig into more details regarding the lock
statement – you can do that here.
By locking, you may guarantee the atomicity of your data, but there are some traps you can fall into. One of them is the notorious deadlock situation. This can happen easier than you would expect if you invoke some external code from within a locked section. This is very important to remember and it’s the central theme of this article. So, let’s state it as a rule.
Rule: Don’t call unknown code from within a critical section.
I will explain what this means and will refer to this rule several times throughout the post, so it will help to keep it fresh in your mind.
But first, let’s take a step back and do a quick refresher on deadlocks in general.
The Textbook Deadlock
A typical deadlock diagram from a Computer Science textbook looks similar to the following:
The diagram itself should be quite self-explanatory. Both Thread 1 and Thread 2 are holding one of the two locks and are blocked waiting for the other one. No way for the execution to continue. Your program is deadlocked.
We can very quickly reproduce this scenario in C# by implementing the following method:
public static void DeadLock() { // Thread 1 runs here var lockOne = new object(); var lockTwo = new object(); Task.Run(() => // The Task.Run() body will be handled by a Thread Pool thread { // Thread 2 runs here lock (lockTwo) { Thread.Sleep(1000); lock (lockOne) { // Do work } } }); lock (lockOne) { Thread.Sleep(1000); lock (lockTwo) { // Do work } } }
You can see how this code reproduces the state from Figure 1. Causing a by-the-book deadlock like this one is all about the ordering of the locks. Concretely, in this example, we have two threads that acquire the two locks in the opposite order. With the help of the Thread.Sleep
statements we are quite confident that Thread 1
will acquire lockOne
, and Thread 2
will acquire lockTwo
before any of the threads have the change to obtain the other lock. Therefore, Thread 1
blocks waiting for lockTwo
and Thread 2
blocks waiting for lockOne
.
This kind of deadlock is an important and fun topic of itself. There is a lot of literature you can go through if you’d like to get deeper into the matter. A good starting point for such a research is probably one of the canonical problems like The Dining Philosophers
For the purpose of this article, let me just mention a couple of ways to solve the problem:
- Avoid the deadlock altogether. Make sure the threads acquire the locks in precisely the same order. So both of the threads will require the locks either in order
lockOne
->lockTwo
orlockTwo
->lockOne
. I encourage you to experiment with our example and make sure you understand how this fixes the problem. - Utilize a timeout to break the deadlock. For example, you can use the Monitor with timeout overload.
The Single Lock Deadlock
Now you may be asking how does the case above relate to our rule from the beginning. It doesn’t very much. And this is part of the problem. For some time, I’ve been thinking that in order to cause a deadlock, you need a recipe similar to the one described above, with the main ingredients being:
- Two(or more) threads
- Two(or more) locks
- Acquire the locks in the opposite order
And while this is one of the recipes for deadlocks, it’s not the only one. There may be a false feeling of certainty when we have just a single lock in our implementation. The thinking is that if multiple threads want to acquire the lock simultaneously, only one will succeed. After its job is done, it will release the lock. Then another thread will repeat the exercise. And another one, and another one… The program flow runs smoothly and our data is protected.
Generally, this logic is valid, but there are other cases when you need to train your intuition to ring the bell. Those are scenarios that can cause deadlocks even with a single lock.
Let’s look closer into that.
Imagine the following sequence:
Thread 1
is holding a lock and marshals a block of code for execution on Thread 2
from within the locked section. Thread 1
blocks waiting for Thread 2
to complete executing the code. Thread 2
, in turn, ends up calling back the code, locked by Thread 1
. Thread 2
cannot acquire the lock because Thread 1
is holding it. Thread 1
can’t release the lock because it waits for Thread 2
to complete. Deadlock!
A conceptual diagram of this situation would look something like this:
Throughout the rest of the article, I will give examples of how this situation may occur and what we can do to mitigate it.
Breaking The Rule
Let’s come up with an example that breaks our rule and calls some unknown code from within the critical section.
Explore the following Worker
class:
public class Worker { private readonly object _lock = new object(); private int _currentProgress; public event EventHandler<EventArgs> ReportProgress; public void DoWork() { for (int i = 0; i < 100; i++) { lock (_lock) { Thread.Sleep(100); _currentProgress++; ReportProgress?.Invoke(this, EventArgs.Empty); } } } public int GetProgress() { lock (_lock) { return _currentProgress; } } }
The logic is straightforward. DoWork()
just runs a simple loop and invokes the registered event handlers at each iteration. Line 16 breaks our rule. By invoking the event handlers, we are executing some external code we have no control over from within the synchronized section.
The question now is – what can go wrong? Well, we don’t know exactly, and that’s the main problem. The clients of this class have the freedom to do all the weird stuff you can or cannot imagine. Including things that would lead to calling back our class from another thread, thus causing a deadlock.
The Toy Example
Now, let’s write some code that leads to a deadlock. Such implementation is something you probably wouldn’t see in production. I’ll give a much more practical example further on, but for now, let’s just come up with some code that results in a deadlock.
Inspect the following client of the Worker
class:
public static void Client() { // Thread 1 runs here var worker = new Worker(); worker.ReportProgress += (sender, eventArgs) => { // Thread 1 runs here var eventHandlerTask = Task.Run(() => { // Thread 2 runs here var progress = worker.GetProgress(); Console.WriteLine(progress); // This will never happen }); eventHandlerTask.Wait(); // Thread 1 blocks waiting for Thread 2 to complete }; // Thread 1 runs here worker.DoWork(); }
I hope you can see how this example accurately reproduces our single-lock deadlock scenario from Figure 2. Thread 1
calls DoWork()
, acquires the lock and gets into the event handler. From there, it starts Thread 2
and blocks waiting for it to complete. Thread 2
calls GetProgress()
, so it blocks on the lock acquired by Thread 1
. Both threads are blocked waiting. Thus we’re deadlocked.
I would agree that the code from this example is too experimental. It’s also too obvious. But even with that, you can see that it takes some brainpower to absorb what’s going on. Now imagine a real-world system when “the caller” can actually be a group of methods and classes calling each other. And in the end, you end up deadlocked. Troubleshooting such a mess is definitely not the way we want to spend our late evenings.
The UI Thread Example
The desktop apps developers are familiar with the concept of the UI thread. It’s the only thread that can modify the user interface.
Imagine we’d like to use our Worker
class in a desktop application. We may want to invoke the super useful DoWork()
method and output the progress to some label on the UI.
There is a problem, though. Our DoWork()
method is(or let’s imagine it is) very computationally expensive. We definitely don’t want to run in the UI thread, as this will freeze the app. Therefore we’ll offload it to a background thread. But remember, when the event handler is invoked, we want to get the current worker progress and print it in the UI. This needs to happen on the UI thread.
Putting it together, here is a sample implementation:
private void MyButton_OnClick(object sender, RoutedEventArgs e) { // UI thread runs here Task.Run(() => { // Thread Pool thread runs here var worker = new Worker(); worker.ReportProgress += (_, eventArgs) => { // Thread Pool thread runs here Dispatcher.Invoke(() => // Marshal the callback to the UI thread and wait until it's done { // UI thread runs here myLabel.Content = worker.GetProgress(); // Deadlock }); }; worker.DoWork(); }); }
This piece of code will cause a deadlock again. Here is what happens:
- UI thread runs the
DoWork()
method on a Thread Pool thread. - The Thread Pool thread acquires the lock in the
DoWork()
method and invokes the event handler. - From within the event handler, the Thread Pool thread calls
Dispatcher.Invoke()
This method is used to synchronously execute some work on the UI thread by queuing it on the Dispatcher Queue and waits for it to finish. This means the Thread Pool thread blocks at this point waiting for the UI thread to complete the work. - The UI thread executes the
Dispatcher.Invoke()
body. It calls theGetProgress()
method. But remember, the lock is already acquired by the Thread Pool thread. So the UI thread blocks on the lock. The Thread Pool thread is also blocked from point 3. - Both threads are waiting for each other.
- The program is deadlocked.
You see, the main problem here is invoking GetProgress()
from another thread – the UI thread in this case. If we get the progress beforehand and just print it on the UI thread the problem will be solved:
var progress = worker.GetProgress(); Dispatcher.Invoke(() => { myLabel.Content = progress; });
But the point remains. There is nothing indicative for the clients that such a deadlock can occur. Even if they happen to write the safe code like the one above, we’re still guilty for allowing such a deadlock possibility.
The Solution
It’s clear from our examples that the pattern is repetitive:
- Acquire a lock in the current thread.
- Call some code which is beyond your control.
- This code somehow traces back into your class on another thread.
Of course, the solution is obvious. Just move the external call outside of the synchronized block:
lock (_lock) { Thread.Sleep(100); _currentProgress++; } ReportProgress?.Invoke(this, EventArgs.Empty);
Now, the external code is invoked after the lock is released, which prevents the deadlock trap.
Some cases are a little more complicated. For example, you may need to pass some data structure to the callback as a parameter. Let’s say access to this data structure needs to be synchronized, so it’s in a locked section. You may be tempted to break the rule, in this case, like so:
lock (_lock) { // do something with dataStructure callback(dataStructure); }
In those cases, I would advise to create a copy of the data structure and invoke the callback from outside the lock.
MyDataStructure copy; lock (_lock) { // do something with dataStructure copy = Copy(dataStructure); } callback(copy);
I hope, at this point, you are convinced that calling external code from within a critical section is not a good idea. You may think this is an elementary rule to follow. That’s not far from the truth. But I have one additional warning. It’s not always so easy to see when you’re invoking external code. Event handlers and callbacks are easy to spot. But there is one scenario that you may miss – virtual methods. Remember that your clients potentially override any virtual method you invoke. So just be aware of that.
Thanks for reading.