Introduction

In distributed systems, leader election is critical for ensuring coordination across multiple services. When one service instance acts as a leader, it takes on special responsibilities, while the others function as followers. If the leader fails, the system must quickly elect a new leader to stay operational.

Leader election mechanisms are commonly used in large-scale systems like Kafka, Cassandra, and Hadoop, where one node in the cluster must take on the role of the leader to manage coordination tasks such as replication, partitioning, metadata management, and more.

This article demonstrates how to implement a leader election system using ZooKeeper, a robust and highly available coordination service, to handle dynamic leader election and failover among multiple service instances. We’ll be using C# with its Zookeeper SDK for the core logic and Testcontainers to simulate a distributed environment where each instance of the application runs in its own container.

You can find the full source code on GitHub.

Why Zookeeper for Leader Election?

ZooKeeper can be utilized for leader election because it provides strong coordination services, allowing us to create synchronous behaviors for critical system operations.

Zookeeper is not just useful for leader election; it also provides us with the primitives to build other powerful higher-level constructs for distributed systems, such as barriers, queues, and locks.

This article focuses on building a functional system to demonstrate leader election and failover. We won’t dive into too much theory. For more theoretical background, you can explore Zookeeper’s official documentation on higher-level constructs and implementation recipes here.

What We’ll Build

In this article, we will:

Run three instances of a service in separate containers, where one of them will be elected as a leader.
Use Testcontainers to simulate a distributed setup so each service runs independently but communicates through Zookeeper.
Demonstrate a live system where the current leader is killed, and one of the remaining services is automatically elected as the new leader without disrupting the system’s availability.

High-Level Overview

The diagram below shows the main components and workflow of our demo system.

Figure 1: Leader Election and Failover

The system consists of a Driver, three API instances, and ZooKeeper as the coordination service. The Driver is responsible for orchestrating the creation of all components, managing their lifecycle, and simulating failover scenarios.

Let’s discuss how the components interact at a high level.

Driver

The Driver program starts by creating the Zookeeper and three API Instances in Docker containers. Each instance registers itself with ZooKeeper by creating an ephemeral sequential node for itself. The position of the node determines the leadership status of the instance.

In Figure 1 (left side), we see that:

API 1 registers first, taking position 1. This makes it the leader.
API 2 and API 3 register after, taking positions 2 and 3, respectively. These instances are followers.

When a leader instance goes down, ZooKeeper notifies the next instance (API 2 in this case), and that instance checks whether it should take over as the leader.

The Driver monitors the leadership status of each API by querying the /leader endpoint of each instance.

Simulated Leader Failover

The Driver simulates a failover scenario by deliberately killing the current leader (API 1 on the diagram’s right side). When the Driver stops API 1:

API 2, which is watching API 1’s node, is notified by ZooKeeper.
API 2 performs a leadership check and becomes the new leader because it now has the lowest node sequence number (position 1).
API 3 remains a follower and continues watching API 2.

Implementation Details

After the high-level overview, let’s dig deeper into some of the technical details.

Leader Election Logic

Figure 2: Zookeeper Leader Election

The core of the leader election process is implemented in the LeaderElection class, which interacts with ZooKeeper to determine which node will act as the leader. Each node is represented by an ephemeral sequential zNode in ZooKeeper. These zNodes are created as children of a root election node, ensuring all participants are organized under a single path. The node with the lowest zNode ID becomes the leader, as depicted in the diagram, where n_1 is the leader and n_2 and n_3 are followers.

Let’s walk through some of the key functionality.

Registration with Ephemeral Sequential zNodes

Here is a snippet for creating an ephemeral sequential zNode in ZooKeeper:

_znodePath = await _zooKeeper.createAsync($"{_electionPath}/n_", [], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);

These zNodes are automatically deleted if the instance goes offline, ensuring that nodes that fail or leave the cluster are no longer considered for leadership. The zNode also gets a unique, incrementing suffix.

Leader Election

The leader is determined by checking if the current zNode has the lowest sequence number. For instance:

var children = (await _zooKeeper.getChildrenAsync(_electionPath)).Children;
children.Sort();
var isLeader = currentNodeName == children[0];

Notifications and Watchers

ZooKeeper allows nodes to set watchers on the next lowest zNode:

await _zooKeeper.existsAsync(_previousNode, true); // Set watch on the previous node

When the leader node goes down, the node immediately above it in the sequence receives a notification.

Using a Channel for Single-Threaded Leadership Checks

In our system, multiple events could trigger the need for a leadership check—such as when an instance registers itself or when the current leader fails. It’s important to perform these checks in a controlled, sequential manner.

If the leadership checks are processed concurrently (i.e., if multiple threads attempt to handle leadership checks at the same time), it could lead to race conditions and conflicting decisions.

To avoid these issues, channels are introduced as a mechanism to serialize the execution of leadership checks. By using a channel, we ensure that tasks are queued and processed one at a time, even if multiple events happen concurrently.

When a new event occurs that requires a leadership check, instead of immediately performing the check, the system writes this task to a channel like so:

await _checkLeadershipChannel.Writer.WriteAsync(CheckLeadership);

Then, a separate thread is run in parallel that processes the channel and performs a leadership check for each entry.

private void StartProcessLeadershipTasks() =>
        _leadershipTaskProcessing = Task.Run(ProcessLeadershipTasks);

private async Task ProcessLeadershipTasks()
    {
        try
        {
            _logger.Information("Starting leadership task processing...");
            await foreach (var leadershipTask in _checkLeadershipChannel.Reader.ReadAllAsync())
            {
                _logger.Information("Processing leadership task...");
                await leadershipTask();
            }
            _logger.Information("Leadership task processing stopped.");
        }
        catch (Exception ex)
        {
            _logger.Error(ex, "An error occurred while processing leadership tasks.");
        }
    }

Election Handler

The ElectionHandler class is responsible for managing the leadership state of the current instance. It implements the IElectionHandler interface, which defines the OnElectionComplete method for handling leadership notifications.

Additionally, it implements the IHostedService interface so that the election handler can run as a background service.

Let’s go through some of the key responsibilities of the Election Handler.

Handling Leadership Notifications

The OnElectionComplete method is invoked whenever the leadership status of the current instance changes. The LeaderElection class, in coordination with ZooKeeper, will trigger this method to notify whether the instance has been elected as the leader or remains a follower:

public Task OnElectionComplete(bool isLeader)
{
    _isLeader = isLeader;
    _logger.Information(isLeader ? "This node has been elected as the leader." : "This node is a follower.");
    return Task.CompletedTask;
}

Starting and Stopping the Election Process (IHostedService)

As a hosted service, ElectionHandler starts the leader election process when the service starts and stops it when the service is stopped.

public async Task StartAsync(CancellationToken cancellationToken)
{
	_logger.Information("Starting leader election process...");
	_leaderElection = await LeaderElection.Create(_zkConnectionString, _electionPath, this, _logger);
}

public async Task StopAsync(CancellationToken cancellationToken)
{
	_logger.Information("Stopping leader election process and closing resources...");
	if (_leaderElection != null)
	{
		await _leaderElection.Close();
	}
}

Checking Leadership Status

The IsLeader method returns the current leadership status of the instance. This method is invoked from the leader API endpoint exposed from the instance.

app.MapGet("/leader", ([FromServices] IElectionHandler electionHandler) =>
{
    var isLeader = electionHandler.IsLeader();
    return Results.Json(new { isLeader });
});

Driver Program and Simulated Failover

The Driver program sets up and manages the entire lifecycle of the components involved in the leader election process.

I encourage you to review the implementation here, but let’s walk through the key steps together:

Setting up ZooKeeper: The driver initializes a Docker network and starts a ZooKeeper container.
Starting API Instances: Then it launches three API instances, each running in its own Docker container. These instances register with ZooKeeper and participate in the leader election.
Checking Leadership: After starting the API instances, the driver queries each instance to determine which one has been elected as the leader by ZooKeeper.
Simulating Failover: The program deliberately stops the current leader to simulate a failover scenario. ZooKeeper automatically triggers a new election, and one of the remaining instances becomes the new leader.
Verifying New Leader: Once the failover is complete, the driver checks the leadership status of the remaining instances to ensure a new leader has been successfully elected.
Cleanup: After the test, it stops all running containers (both API instances and ZooKeeper) and cleans up the Docker network.

Running the Application

There are a few steps to go through to run the system locally.

Building the API Docker Image

To run the API instances within Docker containers (as orchestrated by the driver program), we have created a Dockerfile for the LeaderElection API project.

You need to build the Docker image for the API. Go to the Leader Election project directory that contains the Dockerfile and run:

docker build -t leader-election-api-image .

This image will be used in the driver code to spin up multiple instances of the API in separate containers.

Running the System and Monitoring the Results

After the API image is ready, go to the Driver project directory, and run the project:

dotnet run

Once the system is running, the driver will print the status of each API instance, indicating which instance is the leader. You can observe the leader election process and how leadership changes when the leader is stopped.

The output should look something like this:

In this example, we initially see that the API instance running on port 5003 is elected as the leader. After stopping the leader instance, ZooKeeper automatically detects the failure and elects the API instance on port 5001 as the new leader.

Summary

In this article, we demonstrated how to implement leader election and failover using Zookeeper, .NET Core, and Docker.

We built a simple system where three service instances, running in Docker containers, communicated with Zookeeper to determine the active leader. Using Testcontainers to simulate a distributed environment, we validated the automatic leader election and the system’s response when the current leader failed.

Thanks for reading, and see you next time!

Leader Election and Failover with Zookeeper – Case Study

Introduction

Why Zookeeper for Leader Election?

What We’ll Build

High-Level Overview

Driver

Simulated Leader Failover

Implementation Details

Leader Election Logic

Registration with Ephemeral Sequential zNodes

Leader Election

Notifications and Watchers

Using a Channel for Single-Threaded Leadership Checks

Election Handler

Handling Leadership Notifications

Starting and Stopping the Election Process (IHostedService)

Checking Leadership Status

Driver Program and Simulated Failover

Running the Application

Building the API Docker Image

Running the System and Monitoring the Results

Summary

Resources

Vasil Kosturski

Unit and Integration Testing in Apache Flink

Monitoring Kafka in Kubernetes with Prometheus and Grafana

Related Posts:

Exploring Kafka Streams Partitioning, Scaling, and Fault Tolerance

Producing and Consuming Kafka Messages in CloudEvents Format Using the C# SDK

Kafka Networking via Wireshark

Subscribe To My Newsletter

Leader Election and Failover with Zookeeper – Case Study

Introduction

Why Zookeeper for Leader Election?

What We’ll Build

High-Level Overview

Driver

Simulated Leader Failover

Implementation Details

Leader Election Logic

Registration with Ephemeral Sequential zNodes

Leader Election

Notifications and Watchers

Using a Channel for Single-Threaded Leadership Checks

Election Handler

Handling Leadership Notifications

Starting and Stopping the Election Process (IHostedService)

Checking Leadership Status

Driver Program and Simulated Failover

Running the Application

Building the API Docker Image

Running the System and Monitoring the Results

Summary

Resources

Share this:

Vasil Kosturski

Post Navigation

Unit and Integration Testing in Apache Flink

Monitoring Kafka in Kubernetes with Prometheus and Grafana

Related Posts:

Exploring Kafka Streams Partitioning, Scaling, and Fault Tolerance

Producing and Consuming Kafka Messages in CloudEvents Format Using the C# SDK

Kafka Networking via Wireshark

Subscribe To My Newsletter