Course Overview
ClickHouse Internals

How Clickhouse Stores Data On Disk

Lesson #1

In this lesson we will:

  • Learn how ClickHouse arranges data on disk;
  • Learn key logical abstractions such as parts and granules.

ClickHouse Storage Formats

ClickHouse will ultimately write data to some persistent store, either on a local disk, on shared storage, or an object store such as AWS S3.

Though ClickHouse will manage this process for us, it is useful for us as ClickHouse developers and administrators to know how this is stored and managed so we can design and implement our databases in the most efficient way.

File Structure

It is surprising easy to navigate around the ClickHouse file structure to see how our data is organised on disk.

By default, ClickHouse will create a data folder. Inside there, there will be a folder per database:

drwxr-x---@  4 benjaminwootton  staff  128 13 Nov 16:10 default
drwxr-x---@  2 benjaminwootton  staff   64 13 Nov 16:05 system

If we look inside the default database, we will see a folder for each table within the database:

lrwxr-x---@ 1 benjaminwootton  staff   99 13 Nov 16:07 orders -> /Users/benjaminwootton/development/tools/clickhouse/store/b37/b37d95ba-391d-4ed2-a5c3-758539c139aa/
lrwxr-x---@ 1 benjaminwootton  staff   99 13 Nov 16:10 uk_price_paid -> /Users/benjaminwootton/development/tools/clickhouse/store/4df/4df0a31b-3cb0-4b0a-b1e7-6d562a634e43/

Within the folder for a given table, we will see the table organised into parts:

drwxr-x---@  3 benjaminwootton  staff    96 13 Nov 16:10 ..
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:23 all_13_18_1
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:27 all_19_24_1
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:15 all_1_6_1
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:27 all_25_25_0
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:28 all_26_26_0
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:28 all_27_27_0
drwxr-x---@ 53 benjaminwootton  staff  1696 13 Nov 16:28 all_28_28_0
drwxr-x---@ 51 benjaminwootton  staff  1632 13 Nov 16:19 all_7_12_1
drwxr-x---@  2 benjaminwootton  staff    64 13 Nov 16:10 detached
-rw-r-----@  1 benjaminwootton  staff     1 13 Nov 16:10 format_version.txt

Within each part, will see a collection of files for each column. The actual data is stored in the .bin files, and the indexes (sometimes referred to as marks within ClickHouse) will be stored in the .cmrk2 files.

-rw-r-----@  1 benjaminwootton  staff  13875573 13 Nov 16:23 addr1.bin
-rw-r-----@  1 benjaminwootton  staff      2568 13 Nov 16:23 addr1.cmrk2
-rw-r-----@  1 benjaminwootton  staff   2579163 13 Nov 16:23 addr2.bin
-rw-r-----@  1 benjaminwootton  staff      2125 13 Nov 16:23 addr2.cmrk2
-rw-r-----@  1 benjaminwootton  staff      2155 13 Nov 16:23 checksums.txt
-rw-r-----@  1 benjaminwootton  staff       498 13 Nov 16:23 columns.txt
-rw-r-----@  1 benjaminwootton  staff         7 13 Nov 16:23 count.txt
-rw-r-----@  1 benjaminwootton  staff    159196 13 Nov 16:23 county.bin
-rw-r-----@  1 benjaminwootton  staff       629 13 Nov 16:23 county.cmrk2
-rw-r-----@  1 benjaminwootton  staff      1308 13 Nov 16:23 county.dict.bin
-rw-r-----@  1 benjaminwootton  staff        64 13 Nov 16:23 county.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff  12608940 13 Nov 16:23 date.bin
-rw-r-----@  1 benjaminwootton  staff      1496 13 Nov 16:23 date.cmrk2
-rw-r-----@  1 benjaminwootton  staff        10 13 Nov 16:23 default_compression_codec.txt
-rw-r-----@  1 benjaminwootton  staff    420862 13 Nov 16:23 district.bin
-rw-r-----@  1 benjaminwootton  staff       858 13 Nov 16:23 district.cmrk2
-rw-r-----@  1 benjaminwootton  staff      3817 13 Nov 16:23 district.dict.bin
-rw-r-----@  1 benjaminwootton  staff        64 13 Nov 16:23 district.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff   1224765 13 Nov 16:23 duration.bin
-rw-r-----@  1 benjaminwootton  staff      1033 13 Nov 16:23 duration.cmrk2
-rw-r-----@  1 benjaminwootton  staff    715167 13 Nov 16:23 is_new.bin
-rw-r-----@  1 benjaminwootton  staff      1006 13 Nov 16:23 is_new.cmrk2
-rw-r-----@  1 benjaminwootton  staff   3300789 13 Nov 16:23 locality.bin
-rw-r-----@  1 benjaminwootton  staff       982 13 Nov 16:23 locality.cmrk2
-rw-r-----@  1 benjaminwootton  staff    202171 13 Nov 16:23 locality.dict.bin
-rw-r-----@  1 benjaminwootton  staff        82 13 Nov 16:23 locality.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff         1 13 Nov 16:23 metadata_version.txt
-rw-r-----@  1 benjaminwootton  staff     69640 13 Nov 16:23 postcode1.bin
-rw-r-----@  1 benjaminwootton  staff       850 13 Nov 16:23 postcode1.cmrk2
-rw-r-----@  1 benjaminwootton  staff      9258 13 Nov 16:23 postcode1.dict.bin
-rw-r-----@  1 benjaminwootton  staff        64 13 Nov 16:23 postcode1.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff   4184651 13 Nov 16:23 postcode2.bin
-rw-r-----@  1 benjaminwootton  staff       948 13 Nov 16:23 postcode2.cmrk2
-rw-r-----@  1 benjaminwootton  staff     15886 13 Nov 16:23 postcode2.dict.bin
-rw-r-----@  1 benjaminwootton  staff        64 13 Nov 16:23 postcode2.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff  18202094 13 Nov 16:23 price.bin
-rw-r-----@  1 benjaminwootton  staff      2068 13 Nov 16:23 price.cmrk2
-rw-r-----@  1 benjaminwootton  staff      6641 13 Nov 16:23 primary.cidx
-rw-r-----@  1 benjaminwootton  staff       536 13 Nov 16:23 serialization.json
-rw-r-----@  1 benjaminwootton  staff   3697634 13 Nov 16:23 street.bin
-rw-r-----@  1 benjaminwootton  staff      1332 13 Nov 16:23 street.cmrk2
-rw-r-----@  1 benjaminwootton  staff   4477874 13 Nov 16:23 street.dict.bin
-rw-r-----@  1 benjaminwootton  staff       333 13 Nov 16:23 street.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff    134460 13 Nov 16:23 town.bin
-rw-r-----@  1 benjaminwootton  staff       840 13 Nov 16:23 town.cmrk2
-rw-r-----@  1 benjaminwootton  staff      9726 13 Nov 16:23 town.dict.bin
-rw-r-----@  1 benjaminwootton  staff        64 13 Nov 16:23 town.dict.cmrk2
-rw-r-----@  1 benjaminwootton  staff   3086166 13 Nov 16:23 type.bin
-rw-r-----@  1 benjaminwootton  staff      1058 13 Nov 16:23 type.cmrk2

Key Abstractions

There are two key abstractions to be aware of - granules and parts.

The terms "granule" and "part" refer to specific structures used for data storage and organization. Understanding these concepts is important for grasping how ClickHouse achieves its efficiency and speed.

Granule

A granule in ClickHouse is the smallest unit of data that can be read from a disk in a single operation. It refers to a set of rows in a specific column.

When ClickHouse processes a query, it reads data in granules. This approach is efficient because it minimizes disk I/O operations. By reading a batch of rows at once, ClickHouse can better utilize its columnar storage format.

The size of a granule is 8012 roles by default, but it can vary based on the table's settings and the data type of the column. It's usually configured to balance between read performance and memory usage.

Part

A part in ClickHouse is a directory on the file system that contains data for a specific table.

Each part contains data files for each column of the table, index files, and metadata files. These parts are the result of data being inserted into the table and subsequently merged and compacted for efficiency.

ClickHouse periodically merges smaller parts into larger ones in the background. This process is part of ClickHouse's data compaction and optimization strategy, which helps improve query performance and reduce storage space.

Partitions

In ClickHouse, data can be partitioned. Each partition can contain multiple parts. This partitioning is often based on a column (like a date) and is used for efficient data management and faster queries, especially in time-series data scenarios.

Granules and parts are fundamental components of how ClickHouse organizes and stores data. Granules represent the smallest batch of data read from disk, optimizing for read performance, while parts are larger storage units that encompass data files and indexes, playing a critical role in data management, compaction, and replication. Understanding these concepts is key to optimizing performance and managing data in ClickHouse effectively.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.