Data Engineering Concepts
In-depth guides to every core concept — architecture patterns, modeling techniques, streaming systems, and more.
ETL vs ELT
The fundamental architectural choice in modern data pipelines. When to transform before loading vs after.
Change Data Capture (CDC)
Capture only the rows that changed in your source database using transaction logs — no full scans.
Slowly Changing Dimensions (SCD)
How to handle historical tracking when dimension attributes change over time. Types 1, 2, 3, and 6.
Data Lakehouse Architecture
Combining data lake flexibility with warehouse ACID guarantees using Delta Lake or Apache Iceberg.
Stream Processing
Real-time data processing with event-time semantics, watermarks, and stateful computations.
Data Modeling for Analytics
Star schema, snowflake schema, Data Vault 2.0, and One Big Table — when to use each approach.
Data Quality & Observability
Completeness, accuracy, consistency checks. Anomaly detection, schema monitoring, and lineage.
Data Mesh
Decentralized data architecture where domain teams own their data products end-to-end.
Pipeline Orchestration
DAGs, scheduling, dependency management. Airflow vs Prefect vs Dagster — a practical comparison.