In this lesson we will:
- Show to test seed data.
dbt Sources
dbt incorporates the concept of seed data. This is data that we would like to populate our database with to enable our transformation code, such as lookup lists and static values.
As with everything else in our project, it can be useful to test our seed data for correctness. Though this is a niche requirement, one situation is that our seed data could be changed over time independent of the model code. If someone was to enter bad seed data, it could violate our assumptions and lead to bad data being delivered or failing transformations. As with sources, it is therefore worth validaing the quality of our seed data early in the pipeline.
Testing A Seed
Image we have designed a seed data file in the following format:
code,description
M,Male
F,Female
We would test the seed data in the same way that we would test a model or a source, specifically by describing the seed object and adding a property in the YAML file:
seeds:
- name: genders
description: Gender Codes
columns:
- name: code
tests:
- unique
- not_null
- name: description
tests:
- unique
- not_null
These tests would be executed during the usual dbt test cycle:
dbt test
In this example, we can see that the test has not met the expectation. The seed data would need to be modified or we could change the transformation logic to reflect that our assumption has been violated.
failed
It is also possible to limit our test runs to only testing seed data through the use of a selector:
dbt test --select config.materialized:seed