Practice Project: Checkpoint 4 - Part 1

Adding Seeds, installing Packages from different sources and writing custom macros.

Nov 09, 2025

Part of the “Mastering dbt” series. Access to the full Study Guide. Let’s connect on LinkedIn!

We covered a fair amount of ground in Checkpoint 4, so we will need to break the Practice Project posts into multiple parts.

In Part 1, we will cover adding Seeds and installing Packages and Macros.

We will then continue with further posts covering Incremental Materialisations, Model Governance, Grants, Snapshots, Python models, Exposures, and Documentation.

Seeds

(All CSV files are in my git repo. Link to study note)

You are the Data Analyst in our B2B corporation (I should’ve made up a name for it, too late now), and you received requests from 3 different teams.

Requests

Request 1: The Sales Revenue department requested some additions to the fct_orders table. They asked for two new columns:

A state_code column to understand where those orders were shipped to. They asked for the 2-letter code for the state of each employee who placed the order.

Let’s remember what our DAG looks like:

You could go ahead and add the new column only to the fct_orders table. But, if you want to future-proof your work, the best practice would be to add it to the stg_employees table so the column can be reused by other models, if needed.

In the case of a very large seed (which is not the case here), you could also consider adding it to the int_order_details model for performance.

Request 2: The User Support department needs to identify the countries that they need to call when contacting an employee by phone. They use the dim_employees table to find this information.

Add an area_code column to dim_employees to identify the area code that each employee has on their phone numbers

Again, this is one to be added to the staging layer for reusability.

Request 3: A specific team from the Operations department said that some products are not produced in-house, so they don’t need to track them. They said they keep having to apply manual filters by prod_id to their Excel sheets, and it’s annoying. Other teams still need to have visibility of the external products.

Use the prod_ids csv file provided by them to create a seed identifying in-house and external products. Then add a column to the fct_product_trend table with this information so they can filter by it.

The fct_product_trend also pulls from the int_order_details table. Similar to the first request, we are going to add the new column to the staging layer (stg_products) for reusability.

Configuring the new seeds

To fulfill the 3 requests above, we had to add 3 new seeds into our project. It’s time to configure them.

Let’s start by adding a properties.yml file to the seeds directory.

I added only documentation properties. In the future, we will spice them up with data tests.

You can also define configurations in dbt_project.yml for your seeds. In this case, I did not need quoting, set a different delimiter, or change data types, so I did not configure them in the project file to reduce clutter.

Packages & macros

(Link to study note)

This is not the first time we touch on Packages. In previous checkpoints, we added macros from dbt Hub (generate_surrogate_key and date_spine from dbt_utils).

In this step, we have a few tasks:

Practice adding Packages from other sources: Git and internal Packages.
Understand “dbt clean” in practice.
Add our own macros using Jinja to improve our code reusability.

Adding Packages from Git and Internal Packages

Let’s start with a Git package.

In my project’s repo, I created a core_macros folder, which essentially acts as an external project I will add to my main project. Thus, it requires its own dbt_project.yml file and, of course, the macro we will be using.

I created a simple macro that outputs a string and stored it in this file structure:

I created a subdirectory within my main project repo to store the macro.

The dbt_project.yml of this subdirectory doesn’t require much configuration.

The next step is to add the macro to the packages.yml and run “dbt deps”:

One thing to note is that dbt will pin this package to a specific commit in package_lock.yml. Which means that if you make changes to the macro in a new commit, you need to edit this information in the package_lock file:

Upon installing the git macro, dbt automatically pins it to a commit under revision.

Upon calling the macro, I need to use the subdirectory.macro_name reference so it finds the macro:

Git macro being called successfully. Core_macros is the name of the subdirectory in the repo where the macro lives.

Now, let’s install an internally hosted macro:

We will use the package project_evaluator available on dbt Hub, but we will install it using the tar.gz link: https://github.com/dbt-labs/dbt-project-evaluator/archive/refs/tags/v1.1.2.tar.gz

Installing the Package via tarball link.

After installing it with “dbt deps”, you can run the package’s models with the following command: dbt run --select package:dbt_project_evaluator

I’m running this project on a free tier of BigQuery, so my adapter won’t allow the tables to be created. But, inspecting the logs, I can see the package was successfully installed.

Understanding dbt clean: uninstalling project_evaluator

Since our adapter’s free tier doesn’t allow us to use the dbt_project_evaluator package, we should now remove it.

The steps for removing a package are:

Remove the package from package.yml.
Delete the package_lock.yml file.
Run dbt clean to empty the dbt_packages folder.
Run dbt deps to reinstall the previous packages.

If you inspect the newly created package_lock, you will see that dbt_project evaluator is no longer there.

Adding custom macros to our project

When we created the intermediate models for the orders, we added some case whens to define the level of discount based of time since joined as well as volume of orders. We also had a VAT that would vary according to the product category.

In order to ensure that the same logic is used across the company and to make this code easier to maintain, we should make them as re-usable macros.

Therefore, we are going to create 3 macros:

disc_loyalty: companies with over 2 years of registering get a 5% discount, those with over 5 years get 10%. If less than 2 years, companies with over 200 orders placed get 4% discount.
disc_volume: employees who placed over 100 orders get 5% discount, those with more than 50 orders get 3% and those with over 10 get a whopping 1% off.
vat_category: certain categories have different VATs. Instead of the standard 18%, food has 5% VAT, electronics has 20% VAT, and clothing has 12% VAT.

I created a file for the 2 macros related to discount calculations:

And finally, a separate file for the vat calculation:

Now, we will call these macros in the intermediate models. For example:

Andrea Leonel - Data Analysis & Analytics Engineering

Discussion about this post