WITH Clause
ClickHouse supports Common Table Expressions (CTE), Common Scalar Expressions and Recursive Queries.
Common Table Expressions
Common Table Expressions represent named subqueries.
They can be referenced by name anywhere in a SELECT
query where a table expression is allowed.
Named subqueries can be referenced by name in the scope of the current query or in the scopes of child subqueries.
Every reference to a Common Table Expression in SELECT
queries is always replaced by the subquery from it's definition.
Recursion is prevented by hiding the current CTE from the identifier resolution process.
Please note that CTEs do not guarantee the same results in all places they are called because the query will be re-executed for each use case.
Syntax
Example
An example of when a subquery is re-executed:
If CTEs were to pass exactly the results and not just a piece of code, you would always see 1000000
However, due to the fact that we are referring cte_numbers
twice, random numbers are generated each time and, accordingly, we see different random results, 280501, 392454, 261636, 196227
and so on...
Common Scalar Expressions
ClickHouse allows you to declare aliases to arbitrary scalar expressions in the WITH
clause.
Common scalar expressions can be referenced in any place in the query.
If a common scalar expression references something other than a constant literal, the expression may lead to the presence of free variables. ClickHouse resolves any identifier in the closest scope possible, meaning that free variables can reference unexpected entities in case of name clashes or may lead to a correlated subquery. It is recommended to define CSE as a lambda function (possible only with the analyzer enabled) binding all the used identifiers to achieve a more predictable behavior of expression identifiers resolution.
Syntax
Examples
Example 1: Using constant expression as "variable"
Example 2: Using higher-order functions to bound the identifiers
Example 3: Using higher-order functions with free variables
The following example queries show that unbound identifiers resolve into an entity in the closest scope.
Here, extension
is not bound in the gen_name
lambda function body.
Although extension
is defined to '.txt'
as a common scalar expression in the scope of generated_names
definition and usage, it is resolved into a column of the table extension_list
, because it is available in the generated_names
subquery.
Example 4: Evicting a sum(bytes) expression result from the SELECT clause column list
Example 5: Using results of a scalar subquery
Example 6: Reusing expression in a subquery
Recursive Queries
The optional RECURSIVE
modifier allows for a WITH query to refer to its own output. Example:
Example: Sum integers from 1 through 100
Recursive CTEs rely on the new query analyzer introduced in version 24.3
. If you're using version 24.3+
and encounter a (UNKNOWN_TABLE)
or (UNSUPPORTED_METHOD)
exception, it suggests that the new analyzer is disabled on your instance, role, or profile. To activate the analyzer, enable the setting allow_experimental_analyzer
or update the compatibility
setting to a more recent version.
Starting from version 24.8
the new analyzer has been fully promoted to production, and the setting allow_experimental_analyzer
has been renamed to enable_analyzer
.
The general form of a recursive WITH
query is always a non-recursive term, then UNION ALL
, then a recursive term, where only the recursive term can contain a reference to the query's own output. Recursive CTE query is executed as follows:
- Evaluate the non-recursive term. Place result of non-recursive term query in a temporary working table.
- As long as the working table is not empty, repeat these steps:
- Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. Place result of recursive term query in a temporary intermediate table.
- Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
Recursive queries are typically used to work with hierarchical or tree-structured data. For example, we can write a query that performs tree traversal:
Example: Tree traversal
First let's create tree table:
We can traverse those tree with such query:
Example: Tree traversal
Search order
To create a depth-first order, we compute for each result row an array of rows that we have already visited:
Example: Tree traversal depth-first order
To create a breadth-first order, standard approach is to add column that tracks the depth of the search:
Example: Tree traversal breadth-first order
Cycle detection
First let's create graph table:
We can traverse that graph with such query:
Example: Graph traversal without cycle detection
But if we add cycle in that graph, previous query will fail with Maximum recursive CTE evaluation depth
error:
The standard method for handling cycles is to compute an array of the already visited nodes:
Example: Graph traversal with cycle detection
Infinite queries
It is also possible to use infinite recursive CTE queries if LIMIT
is used in outer query:
Example: Infinite recursive CTE query