In this lesson we will:
- Learn how ClickHouse arranges data on disk;
- Learn key logical abstractions such as parts and granules.
ClickHouse Storage Formats
ClickHouse will ultimately write data to some persistent store, either on a local disk, on shared storage, or an object store such as AWS S3.
Though ClickHouse will manage this process for us, it is useful for us as ClickHouse developers and administrators to know how this is stored and managed so we can design and implement our databases in the most efficient way.
File Structure
It is surprising easy to navigate around the ClickHouse file structure to see how our data is organised on disk.
By default, ClickHouse will create a data folder. Inside there, there will be a folder per database:
drwxr-x---@ 4 benjaminwootton staff 128 13 Nov 16:10 default
drwxr-x---@ 2 benjaminwootton staff 64 13 Nov 16:05 system
If we look inside the default database, we will see a folder for each table within the database:
lrwxr-x---@ 1 benjaminwootton staff 99 13 Nov 16:07 orders -> /Users/benjaminwootton/development/tools/clickhouse/store/b37/b37d95ba-391d-4ed2-a5c3-758539c139aa/
lrwxr-x---@ 1 benjaminwootton staff 99 13 Nov 16:10 uk_price_paid -> /Users/benjaminwootton/development/tools/clickhouse/store/4df/4df0a31b-3cb0-4b0a-b1e7-6d562a634e43/
Within the folder for a given table, we will see the table organised into parts:
drwxr-x---@ 3 benjaminwootton staff 96 13 Nov 16:10 ..
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:23 all_13_18_1
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:27 all_19_24_1
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:15 all_1_6_1
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:27 all_25_25_0
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:28 all_26_26_0
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:28 all_27_27_0
drwxr-x---@ 53 benjaminwootton staff 1696 13 Nov 16:28 all_28_28_0
drwxr-x---@ 51 benjaminwootton staff 1632 13 Nov 16:19 all_7_12_1
drwxr-x---@ 2 benjaminwootton staff 64 13 Nov 16:10 detached
-rw-r-----@ 1 benjaminwootton staff 1 13 Nov 16:10 format_version.txt
Within each part, will see a collection of files for each column. The actual data is stored in the .bin files, and the indexes (sometimes referred to as marks within ClickHouse) will be stored in the .cmrk2 files.
-rw-r-----@ 1 benjaminwootton staff 13875573 13 Nov 16:23 addr1.bin
-rw-r-----@ 1 benjaminwootton staff 2568 13 Nov 16:23 addr1.cmrk2
-rw-r-----@ 1 benjaminwootton staff 2579163 13 Nov 16:23 addr2.bin
-rw-r-----@ 1 benjaminwootton staff 2125 13 Nov 16:23 addr2.cmrk2
-rw-r-----@ 1 benjaminwootton staff 2155 13 Nov 16:23 checksums.txt
-rw-r-----@ 1 benjaminwootton staff 498 13 Nov 16:23 columns.txt
-rw-r-----@ 1 benjaminwootton staff 7 13 Nov 16:23 count.txt
-rw-r-----@ 1 benjaminwootton staff 159196 13 Nov 16:23 county.bin
-rw-r-----@ 1 benjaminwootton staff 629 13 Nov 16:23 county.cmrk2
-rw-r-----@ 1 benjaminwootton staff 1308 13 Nov 16:23 county.dict.bin
-rw-r-----@ 1 benjaminwootton staff 64 13 Nov 16:23 county.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 12608940 13 Nov 16:23 date.bin
-rw-r-----@ 1 benjaminwootton staff 1496 13 Nov 16:23 date.cmrk2
-rw-r-----@ 1 benjaminwootton staff 10 13 Nov 16:23 default_compression_codec.txt
-rw-r-----@ 1 benjaminwootton staff 420862 13 Nov 16:23 district.bin
-rw-r-----@ 1 benjaminwootton staff 858 13 Nov 16:23 district.cmrk2
-rw-r-----@ 1 benjaminwootton staff 3817 13 Nov 16:23 district.dict.bin
-rw-r-----@ 1 benjaminwootton staff 64 13 Nov 16:23 district.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 1224765 13 Nov 16:23 duration.bin
-rw-r-----@ 1 benjaminwootton staff 1033 13 Nov 16:23 duration.cmrk2
-rw-r-----@ 1 benjaminwootton staff 715167 13 Nov 16:23 is_new.bin
-rw-r-----@ 1 benjaminwootton staff 1006 13 Nov 16:23 is_new.cmrk2
-rw-r-----@ 1 benjaminwootton staff 3300789 13 Nov 16:23 locality.bin
-rw-r-----@ 1 benjaminwootton staff 982 13 Nov 16:23 locality.cmrk2
-rw-r-----@ 1 benjaminwootton staff 202171 13 Nov 16:23 locality.dict.bin
-rw-r-----@ 1 benjaminwootton staff 82 13 Nov 16:23 locality.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 1 13 Nov 16:23 metadata_version.txt
-rw-r-----@ 1 benjaminwootton staff 69640 13 Nov 16:23 postcode1.bin
-rw-r-----@ 1 benjaminwootton staff 850 13 Nov 16:23 postcode1.cmrk2
-rw-r-----@ 1 benjaminwootton staff 9258 13 Nov 16:23 postcode1.dict.bin
-rw-r-----@ 1 benjaminwootton staff 64 13 Nov 16:23 postcode1.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 4184651 13 Nov 16:23 postcode2.bin
-rw-r-----@ 1 benjaminwootton staff 948 13 Nov 16:23 postcode2.cmrk2
-rw-r-----@ 1 benjaminwootton staff 15886 13 Nov 16:23 postcode2.dict.bin
-rw-r-----@ 1 benjaminwootton staff 64 13 Nov 16:23 postcode2.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 18202094 13 Nov 16:23 price.bin
-rw-r-----@ 1 benjaminwootton staff 2068 13 Nov 16:23 price.cmrk2
-rw-r-----@ 1 benjaminwootton staff 6641 13 Nov 16:23 primary.cidx
-rw-r-----@ 1 benjaminwootton staff 536 13 Nov 16:23 serialization.json
-rw-r-----@ 1 benjaminwootton staff 3697634 13 Nov 16:23 street.bin
-rw-r-----@ 1 benjaminwootton staff 1332 13 Nov 16:23 street.cmrk2
-rw-r-----@ 1 benjaminwootton staff 4477874 13 Nov 16:23 street.dict.bin
-rw-r-----@ 1 benjaminwootton staff 333 13 Nov 16:23 street.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 134460 13 Nov 16:23 town.bin
-rw-r-----@ 1 benjaminwootton staff 840 13 Nov 16:23 town.cmrk2
-rw-r-----@ 1 benjaminwootton staff 9726 13 Nov 16:23 town.dict.bin
-rw-r-----@ 1 benjaminwootton staff 64 13 Nov 16:23 town.dict.cmrk2
-rw-r-----@ 1 benjaminwootton staff 3086166 13 Nov 16:23 type.bin
-rw-r-----@ 1 benjaminwootton staff 1058 13 Nov 16:23 type.cmrk2
Key Abstractions
There are two key abstractions to be aware of - granules and parts.
The terms "granule" and "part" refer to specific structures used for data storage and organization. Understanding these concepts is important for grasping how ClickHouse achieves its efficiency and speed.
Granule
A granule in ClickHouse is the smallest unit of data that can be read from a disk in a single operation. It refers to a set of rows in a specific column.
When ClickHouse processes a query, it reads data in granules. This approach is efficient because it minimizes disk I/O operations. By reading a batch of rows at once, ClickHouse can better utilize its columnar storage format.
The size of a granule is 8012 roles by default, but it can vary based on the table's settings and the data type of the column. It's usually configured to balance between read performance and memory usage.
Part
A part in ClickHouse is a directory on the file system that contains data for a specific table.
Each part contains data files for each column of the table, index files, and metadata files. These parts are the result of data being inserted into the table and subsequently merged and compacted for efficiency.
ClickHouse periodically merges smaller parts into larger ones in the background. This process is part of ClickHouse's data compaction and optimization strategy, which helps improve query performance and reduce storage space.
Partitions
In ClickHouse, data can be partitioned. Each partition can contain multiple parts. This partitioning is often based on a column (like a date) and is used for efficient data management and faster queries, especially in time-series data scenarios.
Granules and parts are fundamental components of how ClickHouse organizes and stores data. Granules represent the smallest batch of data read from disk, optimizing for read performance, while parts are larger storage units that encompass data files and indexes, playing a critical role in data management, compaction, and replication. Understanding these concepts is key to optimizing performance and managing data in ClickHouse effectively.