In this lesson we will:
- AzureBlobStorage Table Functions;
- AzureBlobStore Table Engine.
Ingesting Data From AWS S3
Though AWS S3 is the most popular cloud object store that we are likely to come across, Azure Blob Storage is likely close behind this for Microsoft Azure centric shops.
Fortunately, ClickHouse also has first class support for reading and writing data to the Azure Blob Store.
In addition, some teams might choose to run their ClickHouse instances on Azure virtual machinies, or migrate to Azure hosted deployments of ClickHouse Cloud. In this case, it makes sense to store data co-located alongside the database cluster which make Azure the natural choice.
In this lesson we wil learn about what is involved in connecting to Azure, initially to query data "in place" without copying it into ClickHouse. We may wish to do this as part of a Data Lakehouse approach where we are making use of data that is stored persistently in a data lake.
Then, we will look at the process and considerations when copying data into ClickHouse, which is a more typical use case which would enable the best possible performance.
Table Functions
We suggest you complete our lesson on table functions as a pre-requisite to this one, as much of the interaction between ClickHouse and Azure Blob Store occurs through the table function abstraction.
Ad-Hoc Querying Of Azure Bob Storage
TBC
Table Functions
The first abstraction we will use is the azureBlobStorage table function. This allows us to query the table in a more ad-hoc way.
CREATE TABLE trips ENGINE = MergeTree ORDER BY tuple
(
) AS SELECT * FROM azureBlobStorage('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_0.gz', 'SOME_ACCESS_KEY', 'SOME_SECRET_ACCESS_KEY', 'TabSeparatedWithNames')
Table Objects
Table functions are good for ad-hoc access. However, where we want to build a persistnet table abstraction then we can do so using the azureBlogStorage table engine.
One of the major benefits of this is that the credentials for accessing the underlying blob store can be stored in the table object. This is more secure than using them inline each time we access through the table function.