Course Overview
Introduction to dbt

Testing Seed Data

Lesson #16

In this lesson we will:

  • Show to test seed data.

dbt Sources

dbt incorporates the concept of seed data. This is data that we would like to populate our database with to enable our transformation code, such as lookup lists and static values.

As with everything else in our project, it can be useful to test our seed data for correctness. Though this is a niche requirement, one situation is that our seed data could be changed over time independent of the model code. If someone was to enter bad seed data, it could violate our assumptions and lead to bad data being delivered or failing transformations. As with sources, it is therefore worth validaing the quality of our seed data early in the pipeline.

Testing A Seed

Image we have designed a seed data file in the following format:

code,description
M,Male
F,Female

We would test the seed data in the same way that we would test a model or a source, specifically by describing the seed object and adding a property in the YAML file:

seeds:
  - name: genders
    description: Gender Codes
    columns:
      - name: code 
        tests:
          - unique
          - not_null
      - name: description
        tests:
          - unique
          - not_null

These tests would be executed during the usual dbt test cycle:

dbt test

In this example, we can see that the test has not met the expectation. The seed data would need to be modified or we could change the transformation logic to reflect that our assumption has been violated.

failed

It is also possible to limit our test runs to only testing seed data through the use of a selector:

dbt test --select config.materialized:seed
Next Lesson:
25

Testing With dbt

In this lesson we will use the testing features of dbt to validate data transformations and pipelines.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.