In this lesson we will:
- Learn about scheduled tasks including what they are and how to use them.
Scheduled Tasks
Tasks allow us to schedule activities to run within the Snowflake database, for instance hourly, daily or according to some arbitrary schedule.
A task could be as simple as a SQL statement, or could call into a stored procedure if we need more complex logic.
Use Cases
There are various use cases for tasks:
- ETL - e.g. Periodically transferring and transforming ingested data;
- Retention - e.g. Periodically deleting data which is no longer necessary;
Job Dependencies and The DAG
Snowflake tasks can be chained together and made dependent on one another.
For instance, A to B to C.
Integration With Streams
Snowflake has a feature called Streams which can be used to track when data is inserted, updated or deleted within tables.
Combining streams with tasks is a powerful combination of features, because we can periodically take actions only on data which has changed.
Streams are discussed in more detail in the next lesson.
External Tools
The advantage of Snowflake tasks is that they can be configured directly within Snowflake using SQL, and require no external tooling to learn or implement.
There are however more powerful and fully featured tools for job scheduling and dependency management which are more commonly used in practice.
dbt for instance is widely used to define transformation runs from a series of dependent tasks. [Airflow] is also a very commonly deployed tool as part of Modern Data Stacks which is more powerful but involves describing jobs in Python.