In this lesson we will:

  • Introduce dbt, the leading tool for data transformations within the Modern Data Stack;
  • Look at an example of a dbt model;
  • Explain some of the high level benefits of using dbt.

Testing dbt Sources

dbt incorporates the concept of Sources. This is where we describe the source data which we are loading as inputs into our transformations. For, example common data sources for data teams include SaaS applications such as Google Analytics or Stripe or extracts from line of business applications.

If this source data contains errors, there is a risk that it could pollute our database or result in us presenting bad data to our end users. And if it does not meet our assumptions and business rules, then it is highly likely that our transformations can fail in strange and unexpected ways. For both reasons, it is worth testing the source data extensively before attempting to do anything with it.

dbt allows us to test sources using all of the mechanisms previously discussed, including property tests, generic tests and singular tests.

In the example below, we are using the in-build property tests to check that the order_id field on our Stripe extract is unique and not null.

  - name: stripe_extract
    description: Daily extract from Stripe 
      - name: orders
          - name: order_id
            description: Primary key of the orders table
              - unique
              - not_null
Next Lesson:

Seed Data

In this lesson we will use dbts seed data feature to reliably populate our database with static data for use as part of dbt transformations.

0h 15m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.