ClickHouse and Friends (14) Storage-Compute Separation Solution and Implementation [ Bohu's Blog ]

Last updated: 2020-09-28

If multiple ClickHouse servers can mount the same data (distributed storage, etc.), and each server can write, what benefits would this bring?

First, we can leave the replication mechanism to distributed storage for guarantee, making the upper architecture simple and plain;

Second, clickhouse-server can be added or removed on any machine, allowing storage and compute capabilities to be fully utilized.

This article explores ClickHouse’s storage-compute separation solution. The implementation is not complex.

1. Problem

ClickHouse runtime data consists of two parts: in-memory metadata and disk data.

Let’s first look at the write flow:

1
2
3

w1. Start writing data
w2. Generate in-memory part information and maintain part metadata list
w3. Write part data to disk

Then look at the read flow:

1
2
3

r1. Locate parts to read from part metadata
r2. Read part data from disk
r3. Return data to upper layer

This way, if server1 writes a piece of data, it only updates its own in-memory part metadata. Other servers are unaware, so they cannot query the just-written data.

For storage-compute separation, the first problem to solve is in-memory state data synchronization.

In ClickHouse, we need to solve the synchronization problem of part metadata in memory.

2. In-Memory Data Synchronization

In the previous post , we learned about the data synchronization mechanism between replicas:
First synchronize metadata, then obtain corresponding part data through metadata.

Here, we borrow the ReplicatedMergeTree synchronization channel, then subtract. After synchronizing metadata, skip part data synchronization, because disk data only needs one server to update (requires fsync semantics).

Core code:
MergeTreeData::renameTempPartAndReplace

1 2	if (!share_storage) part->renameTo(part_name, true);

3. Demo

script:

First start 2 clickhouse-servers, both mounting the same data <path>/home/bohu/work/cluster/d1/datas/</path>
Write a record through clickhouse-server1 (port 9101): (111, 3333)
Query through clickhouse-server2 (port 9102) works normally
Truncate table through clickhouse-server2 (port 9102)
Query through clickhouse-server1 (port 9101) works normally

4. Code Implementation

Prototype
Note that this only implements write data synchronization, and in a very tricky way.

Since DDL is not implemented, the registration method on zookeeper is also quite tricky. The replicas in the demo are all manually registered.

5. Summary

This article provides an idea, a modest proposal to spark discussion, while also looking forward to more systematic engineering implementation.

ClickHouse temporarily does not support Distributed Query functionality. If this capability is supported, ClickHouse storage-compute separation would be an incredibly powerful little hydrogen bomb.