In this lesson we will:

  • Discuss the approach that business are taking to transforming their data as part of Modern Data Platform solutions.

What Are Data Transformations?

Data transformations involve changing the format or structure of data to make it more suitable for analysis, visualization, or other applications. This process is commonly used in data preprocessing to ensure that the data is in a meaningful and usable format. Transformations can include activities such as scaling numerical data, converting categorical data into a numerical format, aggregating or summarizing data points, grouping continuous data into bins, creating new features from existing ones, handling missing data, and more. The goal is to prepare the data for further analysis by improving its quality, relevance, and compatibility with specific algorithms or tools.

From ETL to ELT

Historically, businesses used a process referred to as ETL to populate data warehouses. This involved Extracting data from source applications and databases, Transforming it into the required formats, and then Loading it into the warehouse for consumption.

In the modern data platform, we flip this process around by executing the transformations directly within the database or data warehouse, after it's been loaded. The process therefore changes to Extract, Load and Transform or the acronym ELT.

dbt

dbt is the leading Modern Data Platform for transormations.

dbt is increasingly being deployed within modern data stacks to carry out data transformation.

An example transformation requirement might be to take all of the incoming customer orders, clean up the data for consistency, and aggregate it into a “sales by region” summary table for our business users.

Next Lesson:
05

Analytics and Business Intelligence

In this lesson we will describe how Reporting and Dashboards are typically implemented in the Modern Data Platform

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.