Course Overview
Ingesting Data Into ClickHouse

Ingesting From Azure Blob Storage Into ClickHouse

Lesson #5

In this lesson we will:

  • AzureBlobStorage Table Functions;
  • AzureBlobStore Table Engine.

Ingesting Data From AWS S3

Though AWS S3 is the most popular cloud object store that we are likely to come across, Azure Blob Storage is likely close behind this for Microsoft Azure centric shops.

Fortunately, ClickHouse also has first class support for reading and writing data to the Azure Blob Store.

In addition, some teams might choose to run their ClickHouse instances on Azure virtual machinies, or migrate to Azure hosted deployments of ClickHouse Cloud. In this case, it makes sense to store data co-located alongside the database cluster which make Azure the natural choice.

In this lesson we wil learn about what is involved in connecting to Azure, initially to query data "in place" without copying it into ClickHouse. We may wish to do this as part of a Data Lakehouse approach where we are making use of data that is stored persistently in a data lake.

Then, we will look at the process and considerations when copying data into ClickHouse, which is a more typical use case which would enable the best possible performance.

Table Functions

We suggest you complete our lesson on table functions as a pre-requisite to this one, as much of the interaction between ClickHouse and Azure Blob Store occurs through the table function abstraction.

Ad-Hoc Querying Of Azure Bob Storage

TBC

Table Functions

The first abstraction we will use is the azureBlobStorage table function. This allows us to query the table in a more ad-hoc way.

CREATE TABLE trips ENGINE = MergeTree ORDER BY tuple
(
) AS SELECT * FROM azureBlobStorage('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz', 'SOME_ACCESS_KEY', 'SOME_SECRET_ACCESS_KEY', 'TabSeparatedWithNames')

Table Objects

Table functions are good for ad-hoc access. However, where we want to build a persistnet table abstraction then we can do so using the azureBlogStorage table engine.

One of the major benefits of this is that the credentials for accessing the underlying blob store can be stored in the table object. This is more secure than using them inline each time we access through the table function.

Next Lesson:
05

Ingesting From Kafka Into ClickHouse

In this lesson we will learn how to ingest from Kafka into ClickHouse.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.