DataFlow Hub
Concepts
Glossary
Tools
Interview Prep
Cheatsheet
Roadmap
// tools & platforms
Data Engineering Tools
Deep dives into every major tool — concepts, architecture, code examples, and interview questions.
🔌 Ingestion
Airbyte
freemium
Open-source ELT platform with 300+ connectors
ELT
connectors
open-source
Debezium
free
Open-source CDC platform built on Apache Kafka
CDC
Kafka
real-time
⚙️ Transformation
AWS Glue
paid
Serverless ETL service on Amazon Web Services
serverless
AWS
Spark
dbt (data build tool)
freemium
SQL-first data transformation framework
SQL
ELT
transformation
⚡ Processing
Apache Spark
free
Unified engine for large-scale data processing
big data
PySpark
batch
Databricks
paid
Unified analytics platform built on Apache Spark
Spark
Delta Lake
ML
🎛️ Orchestration
Apache Airflow
free
Platform to programmatically author, schedule and monitor workflows
DAG
orchestration
scheduling
Azure Data Factory
paid
Cloud-based data integration service by Microsoft
Azure
ETL
orchestration
🏛️ Data Warehouse
Snowflake
paid
Cloud data warehouse with separated compute and storage
DWH
cloud
SQL
📡 Streaming
Apache Kafka
free
Distributed event streaming platform
streaming
real-time
messaging
🗄️ Databases
Oracle Database / Oracle Data Integrator
enterprise
Enterprise RDBMS and ETL platform
SQL
RDBMS
ODI
SQL Server / SSIS
paid
Microsoft SQL Server and Integration Services
SQL
SSIS
Microsoft
🔗 ETL / Integration
IBM DataStage
enterprise
Enterprise-grade ETL and data integration
ETL
IBM
enterprise
Talend
freemium
Enterprise data integration and ETL platform
ETL
GUI
enterprise
💻 Programming
Python for Data Engineering
free
The universal language of modern data pipelines
ETL
scripting
pandas