References
Data Engineering For Beginners
Start here
Use SQL to transform data
1
Read data, Combine tables, & aggregate numbers to understand business performance
2
CTE (Common Table Expression) improves code readability and reduces repetition
3
Use window function when you need to use values from other rows to compute a value for the current row
Python connects the different part of your data pipeline
4
Manipulate data with standard libraries and co-locate code with classes and functions
5
Python has libraries to read and write data to (almost) any system
6
Python has libraries to tell the data processing engine (Spark, Trino, Duckdb, Polars, etc) what to do
Data modeling is the process of getting data ready for analyticsUse SQL to transform data
7
Data warehouse contains historical data and is used to analyze business performance
8
Data warehouse modeling (Kimball) is based off of 2 types of tables: Fact and dimensions
9
Most companies use the multi-hop architecture
Working in a team
10
Docker recreates the same environment for your code in any machine
Scheduler defines when & Orchestrator defines how to, run your data pipelines
11
dbt-core is an orchestrator that makes managing pipelines simpler
12
Airflow is both a scheduler and an orchestrator
13
Capstone Project
14
Topics Coming Soon
References
References
14
Topics Coming Soon