In this lesson we will:
- Learn about Kafka consumer groups;
- Understand how consumer groups work together to co-ordinate reliable message delivery;
- Use the kafka-consumer-groups script to analyse consumer groups.
About Consumer Groups
Kafka messages are produced by producers and consumed by consumers.
In many instances, it makes sense to group our consumers into logical groupings depending on how we wish to divide the work. This is acheived using the consumer group feature of Kafka.
A consumer group is a group of Kafka consumers that work together to consume data from a Kafka topic. Each consumer in a group will read data from a subset of the partitions in the topic, and each partition is assigned to only one consumer in the group at a time.
Consumer groups are an important feature of Kafka, as they allow multiple consumers to work together to read data from a topic, providing scalability and fault tolerance. When multiple consumers are part of a consumer group, each consumer in the group will receive a unique subset of the data in the topic, which allows for parallel processing of messages.
Characteristics Of Consumer Groups
Here are some key characteristics of Kafka consumer groups:
Parallelism: With consumer groups, multiple consumers can work together to process data in parallel, which can significantly increase the throughput of the system.
Load balancing: Kafka automatically distributes partitions across consumers in a group, ensuring that each partition is processed by only one consumer at a time. This helps to balance the load across the consumers in the group.
Fault tolerance: If one consumer in a group fails, the partitions it was processing will be automatically reassigned to other consumers in the group. This ensures that the processing of data continues even if some consumers fail.
Group coordination: Kafka provides APIs for managing consumer groups, including APIs for joining and leaving a group, rebalancing partitions, and committing offsets.
Offset management: Kafka tracks the offset of messages consumed by each consumer in a group, which allows for exactly-once message delivery.
Consumer Groups And Partitions
Consumer groups have a tight relationship with the number of partitions from a correctness and performance perspective.
Imagine we have a topic with 10 partitions:
- If we have 10 consumers in a group we are balanced, with each consumer servicing a different partition.
- If we have more than 10 consumers in a group, some will sit idle.
- If we have less than 10 consumers in a group, some consumers will process from more than one partition.
We don't necessarily need to be "balanced". This depends on the nature of the data and the requirements for failover and performance.
kafka-consumer-groups.sh
The Kafka Consumer groups script allows us to view information about the consumer groups that are currently interacting with the broker instance.
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092
Let's use the kafka-console-consumer.sh script to subscribe but using a consumer group.
./bin/kafka-console-consumer.sh --group-name pizzq_prorcessor
Let's use the kafka-console-consumer.sh script to subscribe but using a consumer group.
./bin/kafka-console-consumer.sh --group-name pizzq_prorcessor