Add CI CD Pipeline to test and run dbt jobs
Context
We want to develop a CI/CD pipeline to test, deploy and run automatically our dbt models, ie our data transformations pipelines. We want to implement standard and best practices from the begining in our data transformations.
Problem to solve
This repo contains the SQL models to transform raw data into clean and modelised data. dbt jobs are running on a server and consists in simply triggering SQL queries on a postgres database. In our case, this database lies in a linux virtual machine, accessible via SSH.
The CI/CD pipeline should do the following:
- test SQL syntax and lint .sql files
- deploy automatically dbt jobs where they should run
- run the dbt jobs
Further details
- We need to make sure that we can download packages via pip or dbt with the proxy.
Proposal
Since the gitlab runner does not have direct access to the postgres database, we develop a pipeline that run commands directly in ssh inside the virtual machine.
We need to provide access to a specific created user gitlabci to the cloned repository to run dbt jobs.
We can get sme inspiration from this blog
Who can address the issue
The Mission Numerique dev and data engineer should adress the issue. DNUM should be also concerned when whitelisting IP adress to install packages.