Course Overview
Ingesting Data Into ClickHouse

Using Airbyte With ClickHouse Cloud

Lesson #2

In this lesson we will:

  • Introduce Airbyte;
  • Learn how to use Airbyte Cloud to ingest data from Google Analytics GA4 into ClickHouse Cloud.

What Is Airbyte?

Airbyte is an open-source platform that can help you move data from various sources to different destinations. Example sources include databases, APIs, SaaS applications and flat files and common destinations include data warehouses and data lakes.

Airbyte offers a growing library of pre-built connectors for all of these data sources and destinations. Airbyte then manages and schedules the integration runs and introduces operational tools for monitring them.

This type of ETL work is something which companies often need to do, but we can use tools such as Airbyte and other similar tools to manage this process for us.

All in all, this massively reduces the amount of bespoke work that businesses need to do to move data around.

Other Features Of Airbyte

Airbyte has some other notable features:

  • Data Transformation: Users can apply transformations to the data as it flows through the pipelines, allowing for data cleansing, enrichment, and mapping.

  • Orchestration: Airbyte provides tools for scheduling and orchestrating data extraction and synchronization tasks, ensuring that data is up-to-date and accurate.

  • Monitoring and Logging: It offers monitoring and logging capabilities to track the status and health of data pipelines.

  • Extensibility: Airbyte is highly extensible, allowing users to create custom connectors or adapt existing ones to suit their specific needs.

  • Security: It supports data encryption, authentication, and access control measures to ensure the security of sensitive data.

  • Community and Ecosystem: Being open-source, Airbyte benefits from an active community of contributors and users who continually improve and extend its capabilities.

All in all, it is a fairly compelling proposition for companies and data teams that need to do extensive ETL work.

Airbyte Cloud

Though Airbyte is an open source platform, it is also available as a managed cloud service on a commercial basis at airbyte.com.

Airbyte and ClickHouse

Airbyte has good out of the box support for ClickHouse both as a source and as a destination.

By combining Airbyte Cloud with, ClickHouse Cloud, you are able to deploy a fully managed and fully serverless stack as is our preference.

Walkthrough

We will now walk through the process of using Airbyte to integrate from a source into ClickHouse. For the purposes of this example we will connect to Google Analytics as our source.

Configuring A Source

After signing up at airbyte.com, the first thing we need to do is define our data source. To illustrate the concept, we will connect to our Google Analytics account.

airbytega4

Configuring A ClickHouse Cloud Destination

We will then setup a destination pointing to ClickHouse. If you are using ClickHouse cloud you will need to connect on port 8443.

Note that though it says the password is optional, this needs to be specified else you receive a misleading error message:

airbyteclickhouse

Configuring The Connection

The connection is the concept which connects the source to the destination. In this instance we will be connecting to our ClickHouse Cloud database:

airbyteconnection

The run took approximately 30 minutes for me:

airbytesuccess

The tables have now been created:

SHOW TABLES WHERE name LIKE '_airbyte%'

Query id: 1d49a23a-4bb2-4e1b-bcf3-49ebfa9200b1

┌─name──────────────────────────────────┐
│ _airbyte_raw_daily_active_users       │
│ _airbyte_raw_devices                  │
│ _airbyte_raw_four_weekly_active_users │
│ _airbyte_raw_locations                │
│ _airbyte_raw_pages                    │
│ _airbyte_raw_traffic_sources          │
│ _airbyte_raw_website_overview         │
│ _airbyte_raw_weekly_active_users      │
└───────────────────────────────────────┘

8 rows in set. Elapsed: 0.002 sec.

Pricing Model

In this lesson we demonstrated Airbyte Cloud, a fully managed platform for carrying out this ETL work.

Airbyte Cloud has two pricing modes. When we are calling an API, as we are here with Google Analytics, you pay based on the number of rows transformed.

airbytepricing

If you are loading data from a database such as MySQL, you pay by the GB transferred.

It is of course important to understand the aX§mount of data you will be transferring and only transform data that you will actually use in order to minimise costs.

Next Lesson:
02

Ingesting Static Files Into ClickHouse

In this lesson we will learn about ingesting static files into ClickHouse.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.