Scheduler defines when & Orchestrator defines how to, run your data pipelines

Schedulers define when to start your data pipeline, such as cron or Airflow.

Orchestrators define the order in which the tasks of a data pipeline should run. For example, extract before transform, complex branching logic, and executing across multiple systems, such as Spark and Snowflake. E.g., dbt-core, Airflow, etc

You can open Airflow UI at http://localhost:8080. In the Airflow UI, you can run the dag.

After the dag is run, we can see the dbt documentation by running the following command.

docker exec airflow-spark bash -c "cd /home/airflow/tpch_analytics && nohup uv run dbt docs serve --host 0.0.0.0 --port 8081 > /tmp/dbt_docs.log 2>&1"

The dbt docs is viewable by going to http://localhost:8081.