Course Overview
Ingesting Data Into ClickHouse

Ingesting From Kafka Into ClickHouse

Lesson #6

In this lesson we will:

  • Introduce Kafka;
  • Describe how Kafka and ClickHouse are typically integrated together.

Kafka

Kafka is the leading platform found in industry for working with streaming and real-time data. For more details about Kafka, please consult our course which focusses specifically on it.

As ClickHouse is also often used for real-time analytics use cases, the combination of ClickHouse and Kafka is a very common one. So much in fact that ClickHouse comes bundled with a native integration.

This integration is exposed as a table engine. We create the table like any other, specifying the address of the broker and the topic to listen to.

To create a table engine in ClickHouse with the Kafka table engine, you'll need to define a table in ClickHouse that uses the Kafka engine to read data from Kafka topics. The Kafka engine allows you to ingest data from Kafka into ClickHouse for real-time analytics and processing.

CREATE TABLE your_kafka_table
(
    key String,
    value String,
    timestamp DateTime
)
ENGINE = Kafka
SETTINGS
(
    'kafka_format' = 'JSONEachRow',
    'kafka_broker_list' = 'your_kafka_broker_1:9092,your_kafka_broker_2:9092',
    'kafka_topic_list' = 'your_kafka_topic',
    'kafka_num_consumers' = 1,  -- Adjust as needed
    'kafka_group_name' = 'your_kafka_consumer_group',
    'kafka_security_protocol' = 'PLAINTEXT',  -- Or 'SSL' for secured Kafka
    'kafka_sasl_mechanism' = 'PLAIN',  -- Mechanism depends on Kafka security settings
    'kafka_auto_offset_reset' = 'earliest'
)

Tables As Streams

Tables backed by the Kafka engine are slightly different to normal tables. They do not contain any data, but they are designed to mimic a stream.

We need to create a materialsied view to read from the stream.

Next Lesson:
06

Ingesting Using ETL Tools

In this lesson we will learn about using third party ETL tools to populate ClickHouse.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.