ClickHouse and Friends (14) Storage-Compute Separation Solution and Implementation
Last updated: 2020-09-28
If multiple ClickHouse servers can mount the same data (distributed storage, etc.), and each server can write, what benefits would this bring?
First, we can leave the replication mechanism to distributed storage for guarantee, making the upper architecture simple and plain;
Second, clickhouse-server can be added or removed on any machine, allowing storage and compute capabilities to be fully utilized.
This article explores ClickHouse’s storage-compute separation solution. The implementation is not complex.
1. Problem
ClickHouse runtime data consists of two parts: in-memory metadata and disk data.
Let’s first look at the write flow:
1 | w1. Start writing data |
Then look at the read flow:
1 | r1. Locate parts to read from part metadata |
This way, if server1 writes a piece of data, it only updates its own in-memory part metadata. Other servers are unaware, so they cannot query the just-written data.
For storage-compute separation, the first problem to solve is in-memory state data synchronization.
In ClickHouse, we need to solve the synchronization problem of part metadata in memory.
2. In-Memory Data Synchronization
In the previous post
First synchronize metadata, then obtain corresponding part data through metadata.
Here, we borrow the ReplicatedMergeTree synchronization channel, then subtract. After synchronizing metadata, skip part data synchronization, because disk data only needs one server to update (requires fsync semantics).
Core code:
MergeTreeData::renameTempPartAndReplace
1 | if (!share_storage) |
3. Demo
script:
- First start 2 clickhouse-servers, both mounting the same data
<path>/home/bohu/work/cluster/d1/datas/</path> - Write a record through clickhouse-server1 (port 9101): (111, 3333)
- Query through clickhouse-server2 (port 9102) works normally
- Truncate table through clickhouse-server2 (port 9102)
- Query through clickhouse-server1 (port 9101) works normally
4. Code Implementation
Prototype
Note that this only implements write data synchronization, and in a very tricky way.
Since DDL is not implemented, the registration method on zookeeper is also quite tricky. The replicas in the demo are all manually registered.
5. Summary
This article provides an idea, a modest proposal to spark discussion, while also looking forward to more systematic engineering implementation.
ClickHouse temporarily does not support Distributed Query functionality. If this capability is supported, ClickHouse storage-compute separation would be an incredibly powerful little hydrogen bomb.