Course Overview
Data Lakehouse Architecture

Medallion Architecture

Lesson #3

In this lesson we will:

  • Introduce the concept of the Data Lakehouse;
  • Introduce the medalliion architecture.

Organising The Data Lake

A data lake could be as simple as a collection of files stored in an object store such as S3.

As your data lake gets bigger, it makes sense to organise the data into folder structures.

Adding this layer of organisation is important for discoverability, access controls and general maintenence of your Data Lake.

From Lake To Lakehouse

As we move from Data Lake to Data Lakehouse, the purpose of our Data Lake changes.

Where before it was a repository of files, now it both serves people who want to access the files on an ad-hoc basis, and other people who will now people will be asking the lakehouse for pre-aggregated data.

To aciheve this, a new level of indirection is added.

  • Bronze
  • Silver
  • Gold

Data Management

The core task is to move data from left to right, from bronze, to silver, to gold.

Databricks cigves us a number of ways to acheive this. For instance, we could run ad-hoc notebooks.

