Course Overview
Querying Data In ClickHouse

Explaining Your Clickhouse Queries

Lesson #2

In this lesson we will:

  • Lesson contents 1
  • Lesson contents 2
  • Lesson contents 3

Explain Plan

When we issue a query to Clickhouse it needs to make a decision about how to most efficiently answer our query.

For instance, it needs to decide which order to complete steps, which indexes to use, when to apply joins, and how to parallelize the query across a cluster and across multiple CPU cores.

The EXPLAIN clause helps us to understand the plan which ClickHouse has chosen. Using this clause which can be useful in understanding and optimising performance.

The Process

Let us first understand the process which ClickHouse goes through in order to execute a submitted query.

When a query is issued to ClickHouse, the first thing it does is translate the text to an Abstract Syntax Tree which describes the computation as a series of logical steps. At this stage, ClickHouse will also verify that the query is syntactically correct.

The AST will then be passed through a series of steps in order to optimize the query for performance.

After this optimisation, ClickHouse will then build a query plan which describes how it will be executed. A pipeline is then built.

EXPLAIN Clauses

It is possible to interrogate each step of the above process:

  • EXPLAIN AST will show the Abstract Syntax tree
  • EXPLAIN SYNTAX will show
  • EXPLAN PLAN will show the calculated logical plan
  • EXPLAIN PIPELINE will show the pipelined plan which describes actual interactions

EXPLAIN AST shows us the abstract:

EXPLAIN AST SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

EXPLAIN AST
SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

Query id: 7d60473b-1ec0-4a78-8de3-9ea23ce48a7c

┌─explain───────────────────────────────────────┐
│ SelectWithUnionQuery (children 1)             │
│  ExpressionList (children 1)                  │
│   SelectQuery (children 7)                    │
│    ExpressionList (children 5)                │
│     Identifier town                           │
│     Identifier district                       │
│     Function count (alias c) (children 1)     │
│      ExpressionList                           │
│     Function round (alias price) (children 1) │
│      ExpressionList (children 1)              │
│       Function avg (children 1)               │
│        ExpressionList (children 1)            │
│         Identifier price                      │
│     Function bar (children 1)                 │
│      ExpressionList (children 4)              │
│       Identifier price                        │
│       Literal UInt64_0                        │
│       Literal UInt64_5000000                  │
│       Literal UInt64_100                      │
│    TablesInSelectQuery (children 1)           │
│     TablesInSelectQueryElement (children 1)   │
│      TableExpression (children 1)             │
│       TableIdentifier uk_price_paid           │
│    Function greaterOrEquals (children 1)      │
│     ExpressionList (children 2)               │
│      Identifier date                          │
│      Literal '2020-01-01'                     │
│    ExpressionList (children 2)                │
│     Identifier town                           │
│     Identifier district                       │
│    Function greaterOrEquals (children 1)      │
│     ExpressionList (children 2)               │
│      Identifier c                             │
│      Literal UInt64_100                       │
│    ExpressionList (children 1)                │
│     OrderByElement (children 1)               │
│      Identifier price                         │
│    Literal UInt64_100                         │
└───────────────────────────────────────────────┘

38 rows in set. Elapsed: 0.002 sec.

EXPLAIN PLAN is perhaps the most useful view. This gives us a logcal view of the steps we will take in order.

EXPLAIN PLAN SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

EXPLAIN
SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

Query id: 483e3cac-f4f8-462a-a61d-452e09d9a1b8

┌─explain──────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY [lifted up part])) │
│   Limit (preliminary LIMIT (without OFFSET))                 │
│     Sorting (Sorting for ORDER BY)                           │
│       Expression (Before ORDER BY)                           │
│         Filter (HAVING)                                      │
│           Aggregating                                        │
│             Expression (Before GROUP BY)                     │
│               ReadFromMergeTree (default.uk_price_paid)      │
└──────────────────────────────────────────────────────────────┘

8 rows in set. Elapsed: 0.016 sec.

EXPLAIN PIPELINE goes a layer deeper and includes some of the physical interactions. For instance, we can see that there is a MergeTreeSelect option, and that 8 cores will be used for multiple steps in the execution.

EXPLAIN PIPELINE SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

EXPLAIN PIPELINE
SELECT
    town,
    district,
    count() AS c,
    round(avg(price)) AS price,
    bar(price, 0, 5000000, 100)
FROM uk_price_paid
WHERE date >= '2020-01-01'
GROUP BY
    town,
    district
HAVING c >= 100
ORDER BY price DESC
LIMIT 100

Query id: f6177cf3-fe62-4fe3-a9c9-b19d28a4d307

┌─explain──────────────────────────────────────────────────────────────────────────────┐
│ (Expression)                                                                         │
│ ExpressionTransform                                                                  │
│   (Limit)                                                                            │
│   Limit                                                                              │
│     (Sorting)                                                                        │
│     MergingSortedTransform 8 → 1                                                     │
│       MergeSortingTransform × 8                                                      │
│         LimitsCheckingTransform × 8                                                  │
│           PartialSortingTransform × 8                                                │
│             (Expression)                                                             │
│             ExpressionTransform × 8                                                  │
│               (Filter)                                                               │
│               FilterTransform × 8                                                    │
│                 (Aggregating)                                                        │
│                 Resize 8 → 8                                                         │
│                   AggregatingTransform × 8                                           │
│                     StrictResize 8 → 8                                               │
│                       (Expression)                                                   │
│                       ExpressionTransform × 8                                        │
│                         (ReadFromMergeTree)                                          │
│                         MergeTreeSelect(pool: ReadPool, algorithm: Thread) × 8 0 → 1 │
└──────────────────────────────────────────────────────────────────────────────────────┘

21 rows in set. Elapsed: 0.011 sec.

Query Optimisation

Rule And Cost Based Optimisation

There are two types of query optimizer which are referred to as cost-based optimizer (CBO) and a rule-based optimizer (RBO).

The main difference between a cost-based optimizer and a rule-based optimizer is how they make decisions about query optimization.

Cost-based optimizers use statistical information and cost estimates to select the most efficient execution plan for each query, while rule-based optimizers rely on a predefined set of rules and heuristics.

Cost-based optimizers are generally more sophisticated and adaptable, leading to better query performance in most cases, but they require more computational resources to calculate and compare execution plans. Rule-based optimizers, on the other hand, are more static and may not adapt well to changing database conditions which means they not always find the best query plan.

Next Lesson:
02

Querying External Files

In this lesson we will learn about querying data that is stored outside of ClickHouse in formats such as Parquet.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.