In this lesson we will:
- Introduce Kafka;
- Describe how Kafka and ClickHouse are typically integrated together.
Kafka
Kafka is the leading platform found in industry for working with streaming and real-time data. For more details about Kafka, please consult our course which focusses specifically on it.
As ClickHouse is also often used for real-time analytics use cases, the combination of ClickHouse and Kafka is a very common one. So much in fact that ClickHouse comes bundled with a native integration.
This integration is exposed as a table engine. We create the table like any other, specifying the address of the broker and the topic to listen to.
To create a table engine in ClickHouse with the Kafka table engine, you'll need to define a table in ClickHouse that uses the Kafka engine to read data from Kafka topics. The Kafka engine allows you to ingest data from Kafka into ClickHouse for real-time analytics and processing.
CREATE TABLE your_kafka_table
(
key String,
value String,
timestamp DateTime
)
ENGINE = Kafka
SETTINGS
(
'kafka_format' = 'JSONEachRow',
'kafka_broker_list' = 'your_kafka_broker_1:9092,your_kafka_broker_2:9092',
'kafka_topic_list' = 'your_kafka_topic',
'kafka_num_consumers' = 1, -- Adjust as needed
'kafka_group_name' = 'your_kafka_consumer_group',
'kafka_security_protocol' = 'PLAINTEXT', -- Or 'SSL' for secured Kafka
'kafka_sasl_mechanism' = 'PLAIN', -- Mechanism depends on Kafka security settings
'kafka_auto_offset_reset' = 'earliest'
)
Tables As Streams
Tables backed by the Kafka engine are slightly different to normal tables. They do not contain any data, but they are designed to mimic a stream.
We need to create a materialsied view to read from the stream.