An overview of Sources

Why and how we declare Sources on dbt and the importance of the Source function.

Sep 11, 2025

Part of the “Mastering dbt” series. Access to the full Study Guide. Let’s connect on LinkedIn!
Notes from this documentation.

In dbt, you can directly refer to your raw data tables using the database.schema.tablename syntax.

However, using the Sources feature is a best practice because it enables you to:

clearly define the lineage of your data using the {{ source( ) }} function
test assumptions on your sources
calculate the freshness of your data

Declaring a source

Sources are defined in a .yml file under the “Sources:” key.

You can have one overall file for all your models, or separate them into subdirectories. For instance, if you have two systems from which data gets extracted, you may want to have a subfolder for each system with a separate source file for each.

Another good practice is to add an underscore at the beginning of the name so it sorts to the top of the folder, like “_sources.yml”

Example source file. Source: Sources documentation

Define the name of each of your sources and which database they come from. By default, the schema is assumed to be the name of the source. If you need to override that, add a schema parameter.

Furthermore, you can specify columns under each table and add data tests to them. This will be covered in more detail in future Checkpoints.

Selecting a source

Once the sources have been declared, you can use the {{ source( ) }} function to call them into your SQL.

Example usage of the Source function. Source: sources documentation

You call the name of the source, followed by the table you desire. The compiled code will include the full table name, in this case: raw.jaffle_shop.orders and raw.jaffle_shop.customers

The usage of this function also creates an official dependency between the source table and the other models within dbt.

The DAG once a source has been declared and called in a model. Sources: source documentation

dbt recommends using the source function only at the base or staging layers where sources get joined together, if necessary, and subsequently cleaned. After those stages, you should use the ref function, calling the base/stg models.

Andrea Leonel - Data Analysis & Analytics Engineering

Discussion about this post