Scheduler defines when & Orchestrator defines how to, run your data pipelines

Schedulers define when to start your data pipeline, such as cron or Airflow.

Orchestrators define the order in which the tasks of a data pipeline should run. For example, extract before transform, complex branching logic, and executing across multiple systems, such as Spark and Snowflake. E.g., dbt-core, Airflow, etc

Our Airflow, dbt, and capstone project infrastructure is in a separate folder to keep our setup simple. When you are in the project directory, stop any running container as shown below.

data_engineering_for_beginners_code/> docker compose down
data_engineering_for_beginners_code/> cd airflow
data_engineering_for_beginners_code/airflow> make restart

You can open Airflow UI at http://localhost:8080 and log in with airflow as username and password. In the Airflow UI, you can run the dag.

After the dag is run, in the terminal, run make dbt-docs for dbt to serve the docs, which is viewable by going to http://localhost:8081.

You can stop the containers & return to the parent directory as shown below:

make down
cd ..

The Makefile contains a list of shortcuts for lengthy commands. Let’s look at our Makefile below.

                                           
####################################################################################################################
# Setup containers to run Airflow

docker-spin-up:
    docker compose build && docker compose up airflow-init && docker compose up --build -d 

perms:
    sudo mkdir -p logs plugins temp dags tests data visualization && sudo chmod -R u=rwx,g=rwx,o=rwx logs plugins temp dags tests data visualization tpch_analytics

do-sleep:
    sleep 30

up: perms docker-spin-up do-sleep

down:
    docker compose down

restart: down up

sh:
    docker exec -ti scheduler bash

dbt-docs:
    docker exec -d webserver bash -c "cd /opt/airflow/tpch_analytics && nohup dbt docs serve --host 0.0.0.0 --port 8081 > /tmp/dbt_docs.log 2>&1"

We can see how long complex commands can be aliased to short make commands, which can be run as make command