In this lesson we will:

  • Learn about Kafka partitions;
  • Learn how to create and administer partitions;
  • Learn about event ordering and how it relates to partitions;
  • Learn about partitions keys
  • Use the kafka-console-producer script.

What Are Kafka Partitions?

Partitions are one of the main features of Kafka for achieving parrelism and therefore performance.

A partition is a sub-division of a topic. For instance, our new_pizza_orders topic could be divided into 20 partitions numbered 0-19.

Partitions can also be spread across your broker cluster. For instance, on a 5 server broker cluster, each node could process 4 of the 20 partitions.

They key thing is that wherever they reside, each of these partions can be written to by producers and read from by consumers in parallel allowing us to significantly increase throughput and latency of our Kafka broker and cluster.

Creating Partitions

As already seen, when we explicitly create a topic with, we need to specify a number of partitions. This time, we can pass a higher number to the partitions flag such as 5:

./bin/ --bootstrap-server localhost:9092 --topic new_pizza_orders --create --partitions 5 --replication-factor 1

If we now describe the topic:

./bin/ --bootstrap-server localhost:9092 --topic new_pizza_orders  --describe

We can see that we now have 5 partitions numbered 0-4 configured on the topic:

Topic: new_pizza_orders TopicId: 8pj25fZsSFqDip7VvE3EHg PartitionCount: 5       ReplicationFactor: 1    Configs: segment.bytes=1073741824
Topic: new_pizza_orders Partition: 0    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 1    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 2    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 3    Leader: 0       Replicas: 0     Isr: 0
Topic: new_pizza_orders Partition: 4    Leader: 0       Replicas: 0     Isr: 0

The leader refers to which broker is managing the partition. In this instance, because we only have a single broker setup on the training virtual machine, all partitions for the topic are managed on broker 0.

The number of partitions can also be adjusted after creation. Let's adjust the number of partitions to 10:

./bin/ --bootstrap-server localhost:9092 --alter --topic new_pizza_orders --partitions 10

Kafka does not support reducing the number of partitions at this time. We can only increase the number with the alter command.

Event Ordering & Partitions

As previously mentioned, the order in which we process events will be important. For instance, it makes no sense to update a customer before the customer has been created in our system.

Using partitions has implication for event ordering.

  • The ordering of events is only guaranteed at a partition level
  • We could therefore read and process events outs of order even if they are on the same topic

The key role is that data which has to be processed in order therefore has to go onto the same partition

Keys And Partition Order

Kafka producers help with this ordering challenge by putting all events with the same key onto the same partition.

For instance, all events for Customer ID 1234 could be routed to the same partition to avoid the situation of updating before creation.

Sometimes we need more complex behaviour than this though:

  • Partitioning by a non keyed variable
  • Perhaps our keys are not evenly distributed so a simple hash function over the key is not suitable

This is adjusted using a feature called the partitioning strategy.

Scenario Setup

We will use command line scripts to demonstrate the behaviour of partitions.

Create A New Topic

./bin/ --bootstrap-server localhost:9092 --create --topic new_orders --partitions 5

Create A Console Producer

./bin/ --bootstrap-server localhost:9092  --topic new_orders

Create A Console Consumer

./bin/ --bootstrap-server localhost:9092  --topic new_orders

Describe Consumer Groups

./bin/ --bootstrap-server localhost:9092 --describe --all-groups

Adjust Number Of Partitions

./bin/ --bootstrap-server localhost:9092 --alter --topic new_orders --partitions 10
Next Lesson:

Connecting ClickHouse To Kafka

In this lesson we learn how to connect ClickHouse to Kafka to ingest in real time streams of messages.

0h 10m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.