Course Overview
Introduction to dbt

Defining Tests As Properties

Lesson #4

In this lesson we will:

  • Introduce the concept of dbt properties;
  • Explain how to define static tests using properties;
  • Show how these tests are processed by dbt to produce compiled sql.

dbt Properties

Properties are used to define extra information about our project components in structured YAML files. Properties add further context about the SQL code that we have written in a structured way.

For instance, properties can be used to define details about the columns or schemas of our models:

  - name: average_trip_distance
    description: "Average distance of each trip"
      - name: average_trip_distance
        description: "The average trip distance in the dataset"
          - not_null

Note that historically, properties were defined in a file called schema.yml. However, in recent versions of dbt the file can have any name so long as it has the extension .yml and is stored within your models directory. Some documents may still refer to them as schema files or schema.yml.

Defining Tests As Properties

Some of the easiest tests to define are to use properties to specify tests against our models. For instance, in the example below we have a model named exams, and we have written to ensure that the grade column only has values passed or failed.

  - name: grade
      - accepted_values:
          values: ['passed', 'failed']

If a row contains a value other than passed or failed then the test will fail.

Property Tests Available

dbt comes with a small set of tests that we can use to make assertions about our models through properties. These include:

  • not_null
  • unique
  • accepted_values
  • relationships

Though this doesn't sound like many, they can go a long way in building confidence in our data before even attempting to do anything more complex. Checking for no nulls, unique values, that all values are in some set, and adding in referential integrity checks is a good base to build on.

Adding Additional Property Tests

We can however extend this base set of property tests in two ways.

Firstly, we can write our own Generic Tests where we define our own tests in parameterised macros. These tests could test something generic about the data, such as whether or not it is a positive number, or something more tailored to your domain. Generic tests are then accessed via properties:

  - name: orders
      - name: order_category
          - is_available_order_category

Secondly, we could use a library of pre-built generic tests. A popular library we have used is this port of the Great Expectations test suite. This will enhance your dbt project with tens of potential property tests.

Again, these are accessed as properties:

  - dbt_expectations.expect_column_values_to_be_between:
      min_value: 0  # (Optional)
      max_value: 10 # (Optional)
      row_condition: "id is not null" # (Optional)
      strictly: false 

Between the built in tests, tests provided by third parties and any bespoke generic tests, it should be possible to get very good test coverage of datasets.

How Property Tests Are Processed

When we run a dbt compile or a dbt run, dbt has to go through the process of translating the entries in our YAML properties files into SQL files which can then be executed to check the test.

name: payments_by_type_very_large
  description: "large payments by type"
      - name: constant_currency
       description: "Currency of the payment"
        - not_null

This SQL can be found in the target folder of our build:

select constant_currency
from taxi_production.payments_by_type_very_large
where constant_currency is null

Though ordinarily we wouldn't need to check these files, it can be useful to understand this process and how to find the code when we trying to debug a failing list.

Next Lesson:

Using The dbt Command Line Interface

In this lesson we will use the dbt Command Line Interface to create and configure our first dbt project.

0h 15m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.