Table of Contents
Intro
In the last article, we explored Majority Reads and Majority Writes in Mongo in an attempt to fix a “Read Your Write” consistency violation. That was insufficient due to the replication delay for updating the Majority Snapshot on the Secondary servers.
In this post, we’ll finally see how to solve our consistency issue using Causally Consistent Sessions introduced in Mongo 3.6 (*).
(*) It was still possible to achieve Causal Consistency before Mongo 3.6, but it involved forcing the reads to the Primary, which is quite limiting in terms of scalability.
We’ll first say a few words about Lamport Clocks (or Logical Clocks) and how we model causal relations in a Distributed System.
Then I’ll describe the actual Mongo-specific implementation by investigating the topics of Cluster Time, Operation Time, and Causally Consistent Sessions.
Lastly, I’ll present the final code sample that guarantees Causal Consistency.
Let’s get going!
Useful Resources
This article is very much influenced by the Designing Data-Intensive Applications book. I strongly recommend it if you want to learn more about the nuts and bolts of Data-Driven Systems.
Another excellent learning source is the Introduction to Database Engineering course at Udemy.
“Read Your Write” Violation – Recap
I would strongly suggest visiting the Causal Consistency intro article to see much more in-depth use cases. Still, at this point, it’s enough to understand the workflow from the diagram below:
In essence, the Client performs a Write operation to the Primary server. Then he tries to read the same Write, but the Read request goes to the Secondary. The Write is still not replicated to the Secondary, so the Client cannot read his write.
Let’s see how to solve this!
Fundamentals of Database Engineering
Learn ACID, Indexing, Partitioning, Sharding, Concurrency control, Replication, DB Engines, Best Practices and More!
Solution – High-Level Idea
Imagine the Primary server keeps a counter and attaches an auto-incrementing number to every Write. This number is returned to the Client when the Write operation completes.
Also, the counter value gets propagated to the secondaries during standard replication. In other words, every Secondary knows the latest value of the counter it has seen from the Primary.
So, how does this help with achieving Causal Consistency Guarantees?
The counter represents the ordering of the events in the replica set. If WriteX has a lower counter value than WriteY, then WriteX happened before WriteY.
When the Client receives the counter value of his Write operation, he then passes this value in a subsequent Read request to a (potentially) Secondary server. This instructs the Secondary to wait until the Write with the specified number is replicated.
In Mongo, this happens automatically when your Write and Read requests are part of a Causally Consistent Session. You’ll see a concrete example at the end of the article.
Conceptual Diagram
The following diagram demonstrates the workflow:
- At t0, the
counter
is 0 on both the Primary and the Secondary. - At t1, the Client initiates a Write request to the Primary. As with any other operation in a Causally Consistent Session, he passes the latest known
counter
value in theafterCounter
parameter. Then the Primary performs the Write and increments thecounter
, which is returned to the Client as part of the “successful operation” message. Also, at this point, the Write (with the counter attached) is scheduled for replication. - On a subsequent Read, the Client passes the
counter
received from the previous Write. This instructs the server to wait until the Write withcounter = 1
gets replicated. - The Read request on the Secondary is blocked, waiting for the Write.
- When the Write is propagated, the
counter
on the Secondary is updated, the query is executed, and returns the correct result.
The critical difference with the “failing” version (Fig. 1) of this workflow is in point 4, where the Secondary blocks the request until its counter
catches up.
This is how Causal Consistency is guaranteed conceptually.
Let’s dive into the details.
Lamport Clocks
The counter
from the last section is precisely what’s called a Lamport Clock (or Logical Cock). The name comes from the seminal paper Time, Clocks, and the Ordering of Events in a Distributed System by Leslie Lamport. It is one of the most-cited publications in Distributed Systems theory.
Simply put, the main advantage of a Lamport Clock is it’s just a simple scalar value that represents the ordering of events with happened-before relations. By receiving and passing the latest timestamp he’s aware of, the Client guarantees to read/write causally related events in the correct order.
Also, being a simple integer, a Lamport Clock overcomes a lot of the limitations of physical time that you’ll see in the section below.
Bear in mind that the use case we cover in this article is quite simpler than the ones described in the Lamport Clocks paper. The reason is that Mongo is a Single Leader database – you can write to a single Primary server per replica set. In general, Lamport Clocks are described in terms of a Multi Leader setup which makes the problem more complex.
Still, the main logic stays the same. Feel free to explore the topic at a deeper level on your own.
You can read more about Lamport Clocks in Designing Data-Intensive Applications, Chapter 9 – Consistency and Consensus, Section – Ordering Guarantees -> Lamport Timestamps.
Why Not Using Physical Time (Wall Clocks)?
You might be wondering, can’t we just use the physical time (wall clock) for events ordering. For example, if Event A on Server X happened at 12:00:01 and Event B on Server Y at 12:00:02, that clearly means Event B happened after Event A, right?
Not really.
Physical clocks are one of the most unreliable things in a Distributed System.
Each machine has its own clock – a quartz crystal oscillator. This is not super accurate and has a tendency to drift – it might go slightly faster or slower than other machines.
To deal with the drift, a common approach is to synchronize the clocks using NTP (Network Time Protocol). This is essentially asking a group of servers for the correct time.
Note that such synchronization can lead to the going back. For example, if the Wall Clock had been “too fast” and drifted ahead in time, the sync procedure might get it back.
A whole set of articles can be written on the problems of physical clocks in a Distributed System, but this should be enough to get a sense of the issues they can cause.
Lamport Clocks in Mongo
MongoDB implements Lamport Clocks via Causally Consistent Sessions that I’ll demonstrate with a code sample in the next section.
Before that, as you’re now familiar with the concept of Lamport Clocks and the example from Fig. 2, it’s just a matter of changing a few terms to understand the implementation in Mongo.
The diagram below should make a lot of sense:
The counter
from before is pretty much replaced with clusterTime
. Also, the clusterTime
at the time of the operation, returned to the Client, is called operationTime
.
Of course, this description is still a little simplified. The implementation in Mongo contains many more details behind the scenes that are required for a production system with millions of users.
For example, Mongo uses a Hybrid Logical Clock that allows the value of Lamport Clock to be bound to physical time, which is practically needed in a lot of use cases.
You can read more about the specifics of the Mongo implementation in the paper Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB.
Causally Consistent Sessions in Mongo
Simply put, in order to take advantage of Lamport Clocks and guarantee Casual Consistency in Mongo, you need to use Causally Consistent Sessions.
Here is the final version of the code that fixes our original “Read Your Write” issue.
static void Main(string[] args) { var client = new MongoClient("mongodb://localhost:27018,localhost:27019,localhost:27020/?replicaSet=rs0"); using var session = client.StartSession(new ClientSessionOptions { CausalConsistency = true }); var collection = client.GetDatabase("test-db") .GetCollection<BsonDocument>("test-collection") .WithWriteConcern(WriteConcern.WMajority) .WithReadConcern(ReadConcern.Majority) .WithReadPreference(ReadPreference.Secondary); var sw = new Stopwatch(); sw.Start(); while (sw.ElapsedMilliseconds < 5000) { var newDocument = new BsonDocument(); collection.InsertOne(session, newDocument); var foundDocument = collection.Find(session, Builders<BsonDocument>.Filter.Eq(x => x["_id"], newDocument["_id"])) .FirstOrDefault(); if (foundDocument == null) throw new Exception("Document not found"); } Console.WriteLine("Success!"); }
This piece of code finally outputs the “Success!” message.
The important part is that we initialize a Causally Consistent Session (line 5) and use it when making the Insert (line 19) and Find (line 21) calls.
Behind the scenes, the Mongo driver deals with passing the Cluster Time between the calls, which implements the workflow from Fig. 3, therefore guarantees Causal Consistency.
Summary
In this article, we reviewed the concept of Lamport Clocks and how they help preserve causal relations.
We’ve examined the concrete implementation in Mongo by digging into Causally Consistent Sessions.
In the end, I presented the final code sample, which fixes our “Read Your Write” consistency violation.
Thanks for reading, and see you next time!
Great content! Keep up the good work!