Add snapshots to your DAG
Related documentation
What are snapshots?
Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, snapshots, which records changes to a mutable table over time.
Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an orders
table where the status
field can be overwritten as the order is processed.
id | status | updated_at |
---|---|---|
1 | pending | 2024-01-01 |
Now, imagine that the order goes from "pending" to "shipped". That same record will now look like:
id | status | updated_at |
---|---|---|
1 | shipped | 2024-01-02 |
This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example:
id | status | updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|
1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 |
1 | shipped | 2024-01-02 | 2024-01-02 | null |
Configuring snapshots
It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the dbt snapshot
command in the IDE.
Configuration best practices
How snapshots work
When you run the dbt snapshot
command:
- On the first run: dbt will create the initial snapshot table — this will be the result set of your
select
statement, with additional columns includingdbt_valid_from
anddbt_valid_to
. All records will have adbt_valid_to = null
. - On subsequent runs: dbt will check which records have changed or if any new records have been created:
- The
dbt_valid_to
column will be updated for any existing records that have changed - The updated record and any new records will be inserted into the snapshot table. These records will now have
dbt_valid_to = null
- The
Snapshots can be referenced in downstream models the same way as referencing models — by using the ref function.