Data Strategie

How are you handling pre-aggregation in ClickHouse at scale? AggregatingMergeTree vs ReplacingMergeTree

Reddit r/BusinessIntelligence

Samenvatting

For those running ClickHouse in production — how are you approaching pre-aggregation on high-throughput streaming data? Are you using AggregatingMergeTree + materialized views instead of querying raw tables. Aggregation state gets stored and merged incrementally, so repeated GROUP BY queries on billions of rows stay fast. The surprise was deduplication. ReplacingMergeTree feels like the obvious pick for idempotency, but deduplication only happens at merge time (non-deterministic), so you can ...

Lees het volledige artikel