Introduction to dbt

Testing Seed Data

Lesson #16

In this lesson we will:

Show to test seed data.

We recommend using a free ClickHouse Cloud Account when studying this lesson

dbt Sources

dbt incorporates the concept of seed data. This is data that we would like to populate our database with to enable our transformation code, such as lookup lists and static values.

As with everything else in our project, it can be useful to test our seed data for correctness. Though this is a niche requirement, one situation is that our seed data could be changed over time independent of the model code. If someone was to enter bad seed data, it could violate our assumptions and lead to bad data being delivered or failing transformations. As with sources, it is therefore worth validaing the quality of our seed data early in the pipeline.

Testing A Seed

Image we have designed a seed data file in the following format:

code,description
M,Male
F,Female

We would test the seed data in the same way that we would test a model or a source, specifically by describing the seed object and adding a property in the YAML file:

seeds:
  - name: genders
    description: Gender Codes
    columns:
      - name: code 
        tests:
          - unique
          - not_null
      - name: description
        tests:
          - unique
          - not_null

These tests would be executed during the usual dbt test cycle:

dbt test

In this example, we can see that the test has not met the expectation. The seed data would need to be modified or we could change the transformation logic to reflect that our assumption has been violated.

failed

It is also possible to limit our test runs to only testing seed data through the use of a selector:

dbt test --select config.materialized:seed

Next Lesson:

Testing With dbt

In this lesson we will use the testing features of dbt to validate data transformations and pipelines.

0h 15m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Tell Me More!

Optimising Vehicle Routing Using ClickHouse Cloud And Google OR-Tools

Testing Seed Data

dbt Sources

Testing A Seed

Testing With dbt

Work With The Experts In Real-Time Analytics & AI

Menu

Top Courses