// learning path
Data Engineer Roadmap 2026
A structured, honest guide from SQL beginner to senior data engineer — with the tools, skills, and projects that actually matter.
🌱
01
Foundation
0–3 monthsMaster the fundamentals before touching any ETL tool. Every senior data engineer will test you on these in interviews.
Skills to Learn
▸
SQL (advanced)
Window functions, CTEs, query optimization
▸
Python basics
Functions, file I/O, pandas, list comprehensions
▸
Relational databases
PostgreSQL or MySQL hands-on
▸
Linux command line
SSH, bash scripts, cron jobs
▸
Git & version control
Branching, pull requests, .gitignore
Practice Projects
◎Build a Python script that pulls data from a public API and stores it in PostgreSQL
◎Write 20 SQL queries against a sample dataset (e.g. Northwind)
🔧
02
ETL & Data Modeling
3–6 monthsLearn how to move and model data. These are the core skills for any junior data engineering role.
Skills to Learn
▸
dbt (data build tool)
Models, tests, documentation, incremental builds
▸
Apache Airflow
DAGs, operators, scheduling, XComs
▸
Dimensional modeling
Star schema, SCD types, fact vs dimension
▸
Cloud storage
AWS S3 or Azure ADLS — read/write Parquet files
▸
Snowflake or BigQuery
Warehousing concepts, query performance
Practice Projects
◎Build a full ELT pipeline: source API → S3 → Snowflake → dbt models → dashboard
◎Implement SCD Type 2 for a customer dimension table
⚡
03
Big Data & Cloud
6–12 monthsScale up to distributed systems. This is where most mid-level roles focus.
Skills to Learn
▸
Apache Spark / PySpark
DataFrames, partitioning, optimization, Spark UI
▸
Apache Kafka
Topics, consumers, producers, Kafka Connect
▸
AWS Glue or Azure ADF
Managed ETL services, crawlers, triggers
▸
Delta Lake / Iceberg
ACID transactions, time travel, schema evolution
▸
Docker & containers
Dockerfile, docker-compose for local dev
Practice Projects
◎Process 10GB dataset with PySpark on Databricks Community Edition
◎Build a real-time pipeline: Kafka → Spark Streaming → Delta Lake
🏗️
04
Platform Engineering
12–24 monthsDesign and build complete data platforms. Senior roles require this breadth.
Skills to Learn
▸
Debezium & CDC
Log-based replication, exactly-once delivery
▸
Data quality frameworks
Great Expectations, dbt tests, Soda
▸
Data cataloging
DataHub, Amundsen — lineage and discovery
▸
Infrastructure as Code
Terraform for cloud resources
▸
CI/CD for pipelines
GitHub Actions, dbt Slim CI, automated testing
Practice Projects
◎Design and build a full lakehouse architecture for a fictional e-commerce company
◎Implement an end-to-end data quality monitoring system with alerting
🚀
05
Senior / Staff Level
2+ yearsLead architecture decisions, mentor others, and bridge between engineering and business.
Skills to Learn
▸
Data Mesh principles
Domain ownership, data products, self-serve platform
▸
Data contracts
Schema registries, SLAs, producer-consumer agreements
▸
Cost optimization
Query tuning, partition pruning, storage tiering
▸
On-call & incident response
Runbooks, SLOs, alerting strategies
▸
Technical leadership
RFCs, cross-team alignment, mentoring junior engineers
Practice Projects
◎Lead a data platform migration (e.g. on-prem Hadoop → cloud lakehouse)
◎Define and implement data contracts across 3+ teams
Ready to start? Begin with the interview questions for your target tool.