In this lesson we will:

  • Introduce the key concepts and terminology associated with Kafka.

Core Concepts Of Kafka

Though we will cover many of these concepts in more detail throughout the course, it is worth learning some of the key ideas behind Kafka at this stage. Kafka does introduce some new terminology and works in a slightly different way to previous generation of messaging technology.

Broker

A single Kafka server process is referred to as a Broker. It is the responsibility of the broker to accept messages from producers and distribute them to to interested consumers in a performant and reliable manner.

Broker Cluster

Though it is possible to run with a single Kafka broker, this could be risky in a production environment in case the process or the server were to crash. A single broker may also lead to scalability or performance issues in a big data or low latency environment.

To combat this, brokers are often deployed as a cluster of multiple brokers which work together in a co-ordinated way. This adds resilience, for instance if one of the individual brokers crash, as well as higher throughput and lower latency due to the increased capacity.

Producers and Consumers

Producers are the processes sending messages to the Kafka broker, and Consumers are the processes receiving messages from the broker.

It is possible to have many thousands of consumers and producers interacting with the broker cluster at any one time if necessary.

Kafka will allow for scenarios such as a consumer that temporarily goes offline and needs to continue where it left off, or consumers working in a group to process a single stream of messages. The aim is to offer exactly once processing where no messages are lost or processed twice.

Messages or Events

A broker or broker cluster is responsible for accepting messages from the producers and delivering them to the subscribed consumers.

Kafka messages are comprised of a key and a value. Aside from this, Kafka places very few requirements on the actual format of both the key and the value. They could be Strings, JSON, XML or some binary format. The examples below for instance are all valid messages from a Kafka perspective.

1 : { "order_number" : 1, "order_category" : "Electronics" }
1 : 1/Elecrtronics
!@££$ : !£EADADAR£!£RADDASDASDASDASDASD
<my_key/> : </my_value>

Messages are sometimes referred to as Events, with the two terms being used interchangeably.

Topics

All of the messages that are sent on a Kafka broker are sent to a specific topic. A topic has a name, which could be something such as Orders, WebsiteVisits, or Prices, describing the data within the topic.

Topics can be created statically by the Kafka administrator, or also created more dynamically and openly by producers and consumers as they send and receive their messages.

Retention Period

Kafka topics are configured with a retention period, which is the amount of time that they are kept in the topic before being deleted.

By default, topics are created for a period of 7 days, though there may be instances where we can dramatically shorten this, for instance where the data quickly ceases to be useful, or lengthen it, for instance where we need to retain message history for audit and compliance purposes.

Partitions

In order to provide improved throughput and performance, topics are further sub-divided into partitions which can be written to and read from in parallel.

A WebsiteVisits topic could for instance be further sub-divided into 8 partitions, and Kafa will allow us to read and write to these in paralell, more efficiently making use of the servers CPU and storage to optimise throughput.

Partitioning is therefore a key tool in improving the scalability and throughput of your Kafka cluster.

Event Streaming vs Batch

Kafka is sometimes referred to as an Event Streaming platform. This is because events are sent continuously from source to destination, often immediately as the data is created.

This is in comparison to infrequent batch processing has historically been used for data exchange. Please see our course on Streaming Data or this blog post for further details.

Next Lesson:
02

Setting Up Your Kafka Broker

In this lesson we will set up a Kafka broker in standalone mode.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.