Ingesting Data Into ClickHouse

Ingesting Static Files Into ClickHouse

Lesson #3

Lesson #2Using Airbyte With ClickHouse Cloud

Lesson #4Ingesting Data From AWS S3 Into ClickHouse

In this lesson we will:

Learn about the various mechaniss for ingesting data from local static files.

We recommend using a free ClickHouse Cloud Account when studying this lesson

Local Static Files

The simplest and most common type of ingestion we need to carry out is taking some local static file and loading the data contained within into our ClickHouse instance.

These static files could be in formats such as CSV, JSON or XML files, or binary data formats such as Parquet or Apache Iceberg.

Regardless, our task is often to process these files and ingest the data into our ClickHouse instance.

ClickHouse Client

The simplest way to achieve this is by piping data into the clickhouse client command line interface in a very Unix like way. It involves piping data into the ClickHouse client and specifying a query to handle the INSERT.

cat orders.json | clickhouse-client --query="INSERT INTO orders FORMAT JSONEachRow"

HTTP API

As an alternative, Data can also be pushed into ClickHouse through an HTTP API using a tool such as cURL.

echo '{"order_id":"342434"}'  | clickhouse-client --query="INSERT INTO test FORMAT JSONEachRow"

SQL API with INFILE

Rather than piping through ClickHouse client, we can also ingest local files through the SQL console using the INFILE operator. This is arguably a cleaner way of achieving the same end result:

INSERT INTO hackernews FROM INFILE '/data/hacknernews.csv' FORMAT CSVWithNames

Binary Formats

With binary formats such as Parquet, we can't use the methods of posting through the ClickHouse client or HTTP API. Instead, we need to execute this as SQL, specifying a file type:

CREATE TABLE imported_from_parquet
ENGINE = MergeTree
ORDER BY tuple() AS
SELECT *
FROM file('data.parquet', Parquet)

Remote Static Files

In the examples above we were working with files stored locally. It is also possible to ingest from a remote URL in a similar manner:

CREATE TABLE hackernews ENGINE = MergeTree ORDER BY tuple
(
) EMPTY AS SELECT * FROM url('https://mycompany.com/hacknernews.csv.gz', 'CSVWithNames');

Next Lesson:

Ingesting Data From AWS S3 Into ClickHouse

In this lesson we will learn how to ingest data from AWS S3 into ClickHouse.

0h 15m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Tell Me More!

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

Optimising Vehicle Routing Using ClickHouse Cloud And Google OR-Tools