Summary
Spark lineage is an important yet often overlooked aspect of data engineering. This article outlines the development of a solution for building Spark lineage in data lakes, enabling professionals to gain better insights into data flows and dependencies.
Deepen your knowledge
Knowledge Base
ETL Explained — Extract, Transform, Load in plain language
What is ETL? Learn how Extract, Transform, and Load works, the difference with ELT, and which tools to use. Clearly expl...
Knowledge BaseData Governance for SMBs — A practical approach
What is data governance and how do you approach it as an SMB? A practical guide covering GDPR compliance, data quality, ...
Knowledge BaseData Lakehouse Explained — The best of both worlds
What is a data lakehouse and why does it combine the best of data warehouses and data lakes? Architecture, comparison, a...