Use SQL to transform data

SQL is the foundation on which data engineering works. Most data pipelines consist of SQL scripts tied together. Knowing how to manipulate data with SQL expands to other interfaces, such as Dataframe, since they are used for similar processing but with a different API.

In the data engineering context, SQL is used for

  1. Analytical querying, which involves significant amounts of data and aggregating them to create metrics that define how well the business has been performing (e.g., daily active users for a social media company) and how to predict the future.

  2. Data processing, which involves transforming the data from multiple systems into well-modelled datasets that can be used for analytics.

Knowing SQL in depth will enable you to build and maintain data systems effectively and troubleshoot any data issues.

In this section, we will explore how to utilize SQL to transform data and how to leverage window functions to facilitate complex computations within SQL.