Course Overview
ClickHouse Database Objects


Lesson #6

In this lesson we will:

  • Learn about database indexes;
  • Learn how database indexes are different in ClickHouse;
  • Learn how to create and alter database indexes in ClickHouse;
  • See how to monitor how database indexes are being used in our queries.

Introduction To Indexes

Indexes are database objects which can speed up queries by reducing the amount of data that has to be read from disk and processed.

They work in a similar way to indexes in a book. Instead of scanning the entire book (or database table) to find what you're looking for, you use the index to find the pages (or rows) where your data can be found and skip directly to them.

This can dramatically reduce the time taken to return query results and reduce the computational load on your database servers.

The Cost Of Indexes

Whilst speeding up our queries is of course desirable, there are some costs or downsides to using indexes.

Firstly, they require disk space as they are stored on disk as persistent objects. This can become significant if every row in multiple large tables is indexed.

Secondly, and more importantly, indexes slow down writes into your database because the indexees themslves have to be updated as rows are inserted or updated. As well as the extra wall-clock time for your writes, this also requires additional processing which could impact other concurrent users of the database.

Considering this, it is important to use indexes in a considered way and whilst being aware of their overhead. This is particularly true in analytical systems which typically contain large datasets with large batch updates.

Indexes In ClickHouse

ClickHouse has indexes, though they are slightly different to those used in other databases.

Sparse Indexes

ClickHouse indexes are sparse. This means that not every row in your database is indexed. Instead, it will index every granule, which is a fragment of your database that is typically 8192 rows long by default.

Sparse indexes are used over full indexes for two reasons.

Firstly, it means it keeps the index small enough to fit into memory. This makes lookups significant faster than needing to repeatedly visit the disk to traverse the index.

Secondly, the sparse index will need less frequent updates as data is inserted and as parts are merged.

Together, this means that the sparse index therefore strikes a balance between getting the benefits of an index but without the overhead.

No Secondary Indexes

Many database systems have the concept of a secondary or non-clustered index, which can be added to non-primary key fields to allow fast lookups.

This is not possible in ClickHouse due to it's column oriented nature. Instead, ClickHouse provides an alternate index structure known as the data skipping index.

Indexes In Relation To Parts and Granules

As we discuss in our course on ClickHouse internals, ClickHouse splits tables into parts. A ClickHouse part is a subset of

A ClickHouse part contains data for all of the columns in the table.

Each column is stored in a .bin file, whilst the corresponding index is stored in the .cmrk2 file. In the example below, we are showing the address column for a certain part of the uk_price_paid dataset.

-rw-r-----@  1 benjaminwootton  staff  13875573 13 Nov 16:23 addr1.bin
-rw-r-----@  1 benjaminwootton  staff      2568 13 Nov 16:23 addr1.cmrk2

Creating Indexes In ClickHouse

Indexes are created implicitly in ClickHouse when we create a table.

    id UInt64,
    name String,
    updated_at DateTime DEFAULT now()
ENGINE = MergeTree

Alternatively, if we don't specify a PRIMARY KEY clause, ClickHouse will use the ORDER BY clause:

    id UInt64,
    name String,
    updated_at DateTime DEFAULT now()
ENGINE = MergeTree

It is also possible to use a compound primary key or order by clause to order by mutliple fields in lexographical order.

Next Lesson:


In this lesson we will learn about ClickHouse dictionaries.

0h 5m

Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.