Summary
Long story short, I have a dataset with a few dozen billion rows, which, deserialized, ranges around 500GB-ish. Client wants to be able to run range queries on this dataset, like this one sql SELECT id, col_a, col_b, col_c FROM data WHERE id = 'xyz' AND date BETWEEN = '2025-01-01' AND '2026-01-01' where there are 100million unique IDs and each of them has a daily entry, and wants results to return under 1 second. Col a, b and c are numeric(14,4) (two of them) and int (the third one). Id is a ...