An overview of Sources
Why and how we declare Sources on dbt and the importance of the Source function.
Part of the “Mastering dbt” series. Access to the full Study Guide. Let’s connect on LinkedIn!
Notes from this documentation.
In dbt, you can directly refer to your raw data tables using the database.schema.tablename syntax.
However, using the Sources feature is a best practice because it enables you to:
clearly define the lineage of your data using the {{ source( ) }} function
test assumptions on your sources
calculate the freshness of your data
Declaring a source
Sources are defined in a .yml file under the “Sources:” key.
You can have one overall file for all your models, or separate them into subdirectories. For instance, if you have two systems from which data gets extracted, you may want to have a subfolder for each system with a separate source file for each.
Another good practice is to add an underscore at the beginning of the name so it sorts to the top of the folder, like “_sources.yml”

Define the name of each of your sources and which database they come from. By default, the schema is assumed to be the name of the source. If you need to override that, add a schema parameter.
Furthermore, you can specify columns under each table and add data tests to them. This will be covered in more detail in future Checkpoints.
Selecting a source
Once the sources have been declared, you can use the {{ source( ) }} function to call them into your SQL.

You call the name of the source, followed by the table you desire. The compiled code will include the full table name, in this case: raw.jaffle_shop.orders and raw.jaffle_shop.customers
The usage of this function also creates an official dependency between the source table and the other models within dbt.

dbt recommends using the source function only at the base or staging layers where sources get joined together, if necessary, and subsequently cleaned. After those stages, you should use the ref function, calling the base/stg models.