// concepts & guides
Data Engineering Concepts
In-depth guides to every core concept — architecture patterns, modeling techniques, streaming systems, and more.
architecture
ETL vs ELT
The fundamental architectural choice in modern data pipelines.
ETLELT
ingestion
Change Data Capture (CDC)
Capture only changed rows using transaction logs — no full scans.
CDCDebeziumKafka
modeling
Slowly Changing Dimensions
Handle historical tracking when dimension attributes change. Types 1, 2, 3.
SCDType 2
architecture
Data Lakehouse Architecture
Data lake flexibility with warehouse ACID guarantees via Delta Lake or Iceberg.
lakehouseDelta Lake
streaming
Stream Processing
Real-time processing with event-time semantics, watermarks, stateful computations.
KafkaFlinkSpark
modeling
Data Modeling for Analytics
Star schema, snowflake schema, Data Vault 2.0, and One Big Table.
star schemaKimball
quality
Data Quality & Observability
Completeness, accuracy, consistency checks and anomaly detection.
Great Expectationsdbt tests
architecture
Data Mesh
Decentralized architecture where domain teams own their data products.
data meshdomain ownership
orchestration
Pipeline Orchestration
DAGs, scheduling, dependency management. Airflow vs Prefect vs Dagster.
AirflowPrefectDagster