Power BI

VertiPaq compression — two columns with identical cardinality (~5,000) had 395× different Data sizes

Reddit r/PowerBI

Summary

Power BI gets new insights into VertiPaq compression that improve data efficiency.

VertiPaq Compression: What’s Happening

Recent research has revealed that Power BI's VertiPaq storage engine can show significant differences in data size between two identical columns with nearly matching cardinalities. In a controlled experiment involving 3.4 million rows and 234 columns, it was found that column A with a cardinality of 5,701 occupied 6.77 MB, while column B with a cardinality of 5,033 only needed 0.02 MB, despite both columns having similar datatypes and encoding.

Why This Matters

These findings are crucial for BI professionals using Power BI, as they highlight how data storage can vary even among seemingly equal datasets. This underscores the importance of data analysis and optimization in an era where organizations must manage larger datasets. Competitors like Tableau and Qlik are also exploring ways to enhance storage and performance, but these insights into the internals of VertiPaq offer unique opportunities for efficiency, especially for developers of extensive data models.

Concrete Takeaway

BI professionals should focus on storage efficiency when designing data models in Power BI. It's essential to differentiate data structures and storage methods to optimize performance and costs.

Read the full article
More about Power BI →