In this lesson we will:

  • Learn about dbts testing features which can be used to confirm the accuracy and correctness of any data transformations;

Testing With dbt

dbt includes features for automatically testing the correctness of our transformations each time they are executed.

By incorporating testing into the transformation process, we can build confidence that our transformations are operating as we expect, for instance:

  • That we get the expected number of rows in the output;
  • That columns are unique, not null, greater than zero in line with our expectations;
  • That all data meets our business rules (e.g. all line items should not have a total value greater than the invoice value);

This helps to build quality into the transformations, catching human errors and data errors as early as possible in the data pipeline where problems are easier to resolve before bad data flows downstream or is delivered to our business users.

Automated testing in this way is a very popular technique amougst software engineers, who nowadays have a culture of automatically unit testing individual pieces of logic, and integration testing their end to end solution.

Though there have been various attempts to add testing to data, dbt is the first tool which integrates it so well with the actual transformation code, such that automated testing and practices such as test driven development become viable.

Writing Your First Test

dbt tests are stored in the test directory of your dbt project.

A dbt test is simply a SQL query which should return zero rows if the test passes. If the test returns any rows, those rows are considered to be the failing records which violet your test assertion and should be investigated.

FOOBAR BAZ

Having defined our tests, we can then execute them on an ad-hoc basis like so:

dbt test

We can also limit the test to one particular model during the development loop:

dbt test --select sales_by_store
Next Lesson:
26

Testing With dbt

In this lesson we will use the testing features of dbt to validate data transformations and pipelines.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.