Getting Started with git Branching Strategies and dbt
Outlining the key concepts of environments, branching strategies and CI/CD jobs as well as reviewing configuration for the two main branch strategies.
Part of the “Mastering dbt” series. Access to the full Study Guide. Let’s connect on LinkedIn!
Notes from this dbt documentation.
This documentation covers the topic of git and highlights the main pros and cons of two common branching strategies. It also explains how to set up our environments in dbt to implement our chosen strategy.
However, I am going to take a step back and cover the simple concepts of environments, branching strategies, and CI/CD jobs before we go to the nitty-gritty of configurations.
The tool that we use on dbt to manage branching is GitHub. If you feel like you need an intro on this tool, chapters 1, 2, and 3 of the Pro Git book are a great resource.

Table of contents:
Key Concepts
Branching Strategies
What is an environment?
Stepping out of the dbt universe for a moment, let’s define the difference between development, staging, and production environments, according to this dev.to article.
Development:
It is a sandbox connected to a database that the end-user cannot see.
In this environment, you can test your code, implement changes to it, and try out improvements.
From this environment, you can release (deploy) your code to the end-users.
Only developers have access to this environment.
Staging:
It mimics the data and configurations of the end-user environment.
In this environment, you can test changes made to the code and fix issues before they reach the end-user.
You can also use it to demo an implementation with stakeholders before releasing the code.
Developers own this environment, but limited access can also be given to a stakeholder who needs to review changes.
Production:
This is the final code that end-users will use and connect to dashboards, reporting, etc.
You cannot (or should not) have any mistakes here, as it can affect the credibility of the data team, the company’s buy-in of the data, and even cost money.
Developers and end-users have access to this data with different sets of permissions given to each group.
Each company will have a different strategy as to how it manages these environments and what its structure will look like.
What are the environment configurations within dbt?
According to the full dbt documentation on environments, there are two types of environments:
Development: determines the overall configuration for your dbt project. Each project can only have 1 development environment. It will be connected to a branch in the repository, which will be the branch from which other branches are created, and that will be compared against during pull requests.
Deployment: determines the settings for the jobs executed. A job essentially clones your project from a branch in the repository and executes its models. You can have as many deployment environments as you need according to your branching strategy. These can be of 3 subtypes:
General
Staging
Production
Within each environment, you can configure the version of dbt it will use, the schema it will output data to, and which branch of your repository it will use.
What is a CI/CD?
According to GitLab, a CI/CD job is what gets new code to commit into already existing code. It builds, tests, and deploys the code automatically. This is a process that used to be done manually by the Engineers, so they really revolutionised the way software developers and data engineers work.
Continuous Integration (CI)
Continuous Integration is the practice of integrating all your code changes done in a separate branch into the main branch. It includes the processes of testing that the code is running and generating the desired outputs.
On dbt, a CI job is triggered when a pull request is made on GitHub. The branch in which the changes were made is cloned and compared to the main branch. There are also tools that allow developers to see the impact of the changes to the data (rows/columns affected, etc).
Continuous Delivery (CD)
Continuous Delivery is the practice of deploying code to the main branch. Once code has been tested and built as part of the CI process, CD takes over during the final stages to ensure it's packaged with everything it needs to deploy to any environment at any time.
What is a branching strategy?
The configurations for the environments of our projects will depend on the branching strategy we choose to go for.
When we work on our practice project, we will be a team of one data analyst. Therefore, it is very easy to test and track the changes that we make to our models.
However, imagine a real-life situation in which the database is much more vast and complex than our 5-table dataset, and we are working as part of a team of several data analysts serving different stakeholders? This is when a branching strategy comes in handy.
A branching strategy will give a structure and process for how changes are made to the data pipeline. It organises the work of the team so each analyst can work on their piece of code without breaking what the end-user can see or the work of other analysts. It is supposed to foster data accuracy and collaboration within the team.
After the initial commit, a repository starts with one default branch: main. The main branch is equivalent to production - where the final code should be deployed for others to use.
The workflow to get changes done to the code into the main branch is your branching strategy.
This workflow is made up of four steps:
Development: creating and changing the code
Quality Assurance: ensuring changes produce the expected output
Promotion: moving changes to the next stage
Deployment: activating the changes to others
There are two main common branching strategies: Direct Promotion and Indirect Promotion.
Direct Promotion
In this strategy, we keep only one main branch and a feature branch for changes where the pull request (PR) and merge take place. It is a good idea to start with the Direct Promotion strategy, see what you need as you develop the project, and apply the necessary changes to the strategy. It is easy to change configurations for the git strategy.

Configuring the Direct Promotion strategy:
The original post contains a video which I found to be more elucidating than the post itself. My notes below refer to the video.
The environments:
Development: dev schema in the database with full access
QA (stg): QA schema in the database with service accounts created to provide the right permissions
Production: prod schema in the database with service accounts created to provide the right permissions
The workflow:
Your main code lives in a default brand (main)
A separate branch is created (a feature branch)
Changes are implemented
Create a pull request (PR) on Github
The PR triggers a CI job on dbt, which creates a temporary PR schema
Peer review takes place in the temporary PR schema
Once the merge is approved, a CD job is triggered to merge the code to the main branch
The CI/CD jobs required:
QA CI PR job (QA environment)
triggered when selecting “Create a pull request on Github”
the command run is “dbt build -- select state:modified”
“Yes; defer to production”: it means that the job will read the data from production, write it to the QA environment with the changes made (without affecting production and without having to duplicate data into QA - cost-effective)
the job checks if the code with the changes will run successfully.
on the database, it only builds the tables that have changed
stakeholders can review the tables created in the QA schema to ensure the output is correct
under Modified in the job configuration, you can check what has been changed (PKs, rows and columns)
CD PR Merge Job (Production environment)
applies the changes from the QA schema into production
Indirect Promotion
In this promotion type, we add a Middle Branch (commonly known as QA, UAT, staging, or preprod) where all feature branches will originate from and merge into.

This Middle Branch (hereafter referred to as UAT) will then hold all the changes done to the code, and a periodic pull request will take place here to merge these changes into the main branch (Production). This deployment process of batched merges to main is known as Continuous Delivery.
Configuring the Indirect Promotion strategy:
The process I am going to describe below is the one described in the video with an example of an indirect promotion setup. It contains an extra environment called QA (not the Middle Branch described above), but it will make sense below.
The environments:
Production: main branch and dedicated schema
Development: UAT branch and dedicated Dev schema
UAT: UAT branch and dedicated UAT schema. This is where all consolidated changes will live.
QA: UAT branch and dedicated QA schema. Where the CI jobs take place. It exists so you can see the details of the changes made and their impact using dbt’s advanced CI tools.
The UAT branch needs to stay in sync with the main branch (production) so that the feature changes are done based on up-to-date data. It also needs to be protected so it won’t be deleted.
The workflow:
Create a feature branch off of your UAT branch to add your changes.
Generate a pull request from the feature branch into the UAT branch.
A CI job is triggered in the QA environment - this is where changes would be reviewed in the QA schema.
Once approved, changes would be merged into the UAT branch in the UAT environment through a CD job.
A periodic pull request with all the changes contained in the UAT branch to the main branch (Production) would take place. Here, a second CI and CD take place.
What are the advantages of Indirect Promotion?
These are the advantages of having a UAT branch rather than deploying changes directly into Production:
The UAT branch becomes an environment from which you could generate a dashboard to view the impact of the changes made to the code. This cannot be done in the Direct Promotion strategy, as the feature branch is merged directly onto the main branch (Production) with the rest of the data.
Teams can see the changes made by other teams in this UAT environment. This is not possible in Direct Promotion because there isn’t a consolidated environment with all the changes. They’d only be able to see a change made by other teams directly in the main branch.
By adding an extra environment for CI (QA), developers can make use of the advanced CI tools on dbt that allow us to see the details of the changes made (rows affected, etc)
The UAT branch becomes an environment where changes can be tested before they hit Production.
Teams can have a schedule for making several changes public in one go rather than constantly adding new changes.
You have extra layers of Quality Assurance in this strategy, which might be necessary for high-stakes data consumption.
What are the disadvantages of Indirect Promotion?
It is more time-consuming and may not suit fast-paced businesses that prioritise flexibility over quality assurance.
You need more people who understand branch management to oversee the different parts of this process.
If the UAT branch isn’t in sync with the main branch (Production), then developers take the risk of working with inadequate data in their feature branches.
Some BI tools don’t offer the ability to change schemas so easily, which would make reviewing changes in the QA environment, for instance, complicated.
You are working with more environments, branches, and jobs with their own configurations, and this adds complexity.