Table of Contents
Intro
A few weeks back, I needed to run a single-broker Kafka cluster in Docker via docker-compose. I also had to expose it to the host machine on some custom port e.g. not the default 9092.
In case you’re curious why I’d need that – it was about integration tests for a piece of code that involved publishing and consuming Kafka messages. So the idea was that when a test suite runs, it would spin up a Kafka cluster in Docker, then destroy it after the tests are complete. If multiple test suites run in parallel, I didn’t want the ports to collide, so that’s where the “custom available port” requirement came from. For creating and destroying the test containers, I was using a handful library called TestContainers.
I will present a sample project with such a testing infrastructure in a subsequent post, but here I want to focus just on running the cluster itself, as it happened to be a little more tricky than I thought initially.
My first approach was admittedly fairly ignorant. I just took some docker-compose file with Kafka, changed a few parameters here and there, and took it for a spin. Unfortunately, it didn’t work, and I was getting connectivity errors. I spent some time in frustration trying different permutations of the params but didn’t really make any progress.
That’s when it became apparent that I needed to take a step back and dig deeper into the semantics of the Kafka communication protocol.
I read a couple of great articles on the topic of “Kafka listeners” that you can find here and here. I looked even further by tracking the calls with Wireshark getting a deeper sense of the overall communication flow. This is something I’m planning to present in a future post as well.
In this article, though, I’d like to keep things simple and focused, give a quick solution to the problem I was having, and explain the Kafka protocol from a high-level perspective for the specific use case.
More concretely, we’ll go through the following:
- Set up Kafka in Docker on a custom port.
- Explain the network communication conceptually between the client and the broker.
- Explain the concept of “listeners” and the role they play.
- Using
kafkacat
to test and troubleshoot your setup.
Let’s get started.
TL;DR
If you’re just trying to spin up Kafka in Docker on some custom port, you can copy the code snippet below into a docker-compose file and execute it. Then, you’ll have Kafka running on port 50001
mapped to the same port on your host machine.
--- version: '3' services: zookeeper: image: confluentinc/cp-zookeeper:7.0.1 container_name: zookeeper restart: always ports: - "50000:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 broker: image: confluentinc/cp-kafka:7.0.1 container_name: broker restart: always ports: - "50001:50001" depends_on: - zookeeper environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT_EXTERNAL:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT_EXTERNAL://localhost:50001,PLAINTEXT_INTERNAL://broker:50002 KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT_INTERNAL KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
If you want to understand the details of why all these parameters are needed and how they are used during client calls, stick around!
High-Level Workflow
There are a few crucial things you need to understand when you’re trying to run Kafka locally in Docker and “it doesn’t work.”
When you’re sending messages to a Kafka topic, you start with some URI to connect to initially. Most people think this same URI is used for both establishing the connection and sending messages.
That’s not the case.
It’s important to realize that the Kafka endpoint address you’re using (e.g. localhost:500001
in our case) is just for an initial metadata (or bootstrap) call that will return some info for the available brokers. This contains the actual brokers’ endpoints you need to use to send messages afterward.
These brokers’ endpoints are configured via the so-called “listeners.”
To make things concrete, take a moment to review the diagram below.
Let’s explore this sequence of calls.
The Metadata Call
If a client wants to send/read a message from Kafka, it first needs to initiate a metadata request. This is also known as the bootstrap call. If you’ve ever used Kafka, you know you have to provide a config for the “bootstrap servers” that, if running Kafka locally or in Docker, might look like this:
localhost:9092
As I’ve mentioned, this endpoint is only used for the “metadata” call, which is meant to return info about the brokers in the cluster together with the endpoints they can be reached on.
The Metadata Response
Once a client receives the metadata, it can connect to one of the brokers via the specified address. The exact broker to send messages to depends on which is the leader for the concrete partition.
Now, the question is – how and in what shape/format are the brokers’ endpoints configured?
This is accomplished when you set up the listeners in your Kafka config.
Pay attention to this snippet from the docker-compose file and the diagram:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT_EXTERNAL://localhost:50001,PLAINTEXT_INTERNAL://broker:50002
ADVERTISED_LISTENERS
entries are returned to the clients as part of the metadata response. So, in our example, the client gets back localhost:50001
. And this is the exact address it would use to send messages to the broker.
This might be a little confusing because, in the current scenario, the same address is used for the initial connection and then for producing messages – localhost:50001
. That’s only because there’s just a single broker in our setup. In the next post, I’ll explore a more real-world use case that should hopefully shed some more light on the protocol.
You can use a tool called kafkacat to get a better sense of what this process looks like conceptually. The command below displays the metadata for the cluster:
You can see there’s one broker returned, and you can reach it at localhost:50001
Producing/Consuming Messages
Once the client receives the metadata, it can publish/consume messages on one of the brokers, depending on which one is the leader for the partition.
Here’s an example of using kafkacat
to produce a few messages on a topic called new_topic
:
Inter Broker Communication
Kafka brokers communicate with each other typically on some internal network.
In order to configure how a broker should be reached by other brokers, you need to configure two things:
- A listener entry. In our example, this is
PLAINTEXT_INTERNAL://broker:50002
. - Reference this listener in the
KAFKA_INTER_BROKER_LISTENER_NAME
config item.
This config is required even if we have a single broker. Bear in mind that in a production environment, you’ll always have multiple brokers for redundancy and availability, usually starting from three.
Summary
In this article, we’ve learned how to set up a single-node Kafka cluster locally in Docker on a customer (non-default) port.
We examined some of the core concepts of the Kafka protocol and the interactions between the client and the cluster.
You’ve also seen how to utilize kafkacat
to test and troubleshoot your setup.
Next, it’s time for the fun part and digging deeper into the protocol. I will set up a multi-broker cluster and closely examine the client-broker interaction via Wireshark.
Thanks for reading, and see you next time!