Course Overview
Introduction to dbt

Sources and Exposures

Lesson #20

In this lesson we will:

  • Learn about the dbts source and exposure features.

What Are Sources

As part of our work with dbt we will likely be taking data from original data sources. This data could be extracted from an application, another database or a source such as a spreadsheet. It is then usually uploaded into the database and exposed as tables or views which will form the basis of our dbt pipelines.

The Source feature of dbt allows us to mark relevant tables as external source tables using metadata. This is useful for a few reasons:

  • We need to refer to source tables or views in our dbt pipelines, but we do not want them to ever be created or materialsied by dbt. They are totally outside of the control of dbt;
  • By marking the object as a source, we are being explicit about where it sits in the data lineage pipeline and DAG dependencies;
  • We may wish to test assumptions about our data source prior to starting any transformation. This is subtly different to testing one of our dbt models;
  • Marking a table or view as a source allows us to calculate freshness of the source data for the purposes of building incremental views.

Using Sources

We can specify a table or view as being a source in a YAML configuration file:

sources:
  - name: ecommerce_system
    tables:
      - name: customers
      - name: products

Once created, we can refer to sources using a Jinja function in the same way we use the ref function when adding dependencies on dbt models.

select
  product_name, price
from {{ source('ecommerce_system', 'products') }}

Exposures

A dbt transformation DAG has three types of model:

  • Sources e.g. Tables containing our source data. These are marked as Sources as per the above;
  • Intermediate objects - e.g. Tables containing our intermediate calculations and aggregations. These may not be appropriate for people in the business to use;
  • Destinations e.g. Tables containing the data we actually want our user community to use which meet our desired standards for accuracy and completeness.

Where the Sources feature described above allows us to mark data sources, Exposures allow us to use metadata to represent the tables at the end of our piplines.

We can add metadata to our Exposures such as how it is used (a dashboard, report or notebook), and an email address for the owner of the downstream consumer. This is a simple feature, but can massively support the day-to-day workings of a data team who need to co-ordinate with their data consumers when making changes and ensuring they don't break downstream consumers.

Controlling dbt Runs For Sources and Exposures

During the development process, it is sometimes useful to only run the models dependent on a particular source, or all of the models which feed into some exposure. This can be done with the --tbc flag.

All transformations downstream from a source can be executed using select criteria:

dbt run -s source:product_sales

And all exposures upstream of a given source can be executed in the following way:

dbt run -s +exposure:product_sales_by_category

This makes the development process much more efficient, and could also be used as part of automation where we only need to update data for given exposures periodically.

Next Lesson:
32

Sources and Exposures

In this lesson we will learn about dbts source and exposure features to capture better metadata regarding your pipelines.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.