Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Become a data engineer
Week 0 - Helpful resources
Python
Scala
Docker
PyCharm and IntelliJ setup examples
Apache Kafka 101
Apache Spark 101
Apache Airflow 101
Doing your homeworks
Apache Spark Kata exercises
Checklist
Troubleshooting
Week 1 - Introduction
Introduction
About me
About data engineering
About the course
About the pet project
Homework
Week 2 - Basic concepts
Introduction
Data pipeline
Batch processing
Streaming processing
Data architectures - Lambda
Data architectures - Kappa
Data architectures - Zeta
Data architectures - SMACK
Data architectures - data lake, lake house, data mesh
Data quality
Data stores
Data consistency
Distributed processing
Data knowledge
Tools used in the course
Homework
Week 3 - Kappa architecture
Introduction
Problem statement
API Gateway - introduction
API Gateway - technical aspects
Data governance
Data migration
Delivery semantics
Idempotency
Fault-tolerance
Replication
Partitioning - introduction
Partitioning - methods
Partitioning and bucketing - demo
ACID file formats - introduction
ACID file formats - Delta Lake demo
Batch layer
Homework
Week 4 - Data cleansing
Introduction
Problem statement
Data enrichment
Data anonymization
Late data
Deduplication
Metadata
Schema
Schema registry
Schema registry - demo
Schema evolution - demo
Schema management for semi-structured data
Schema management for semi-structured data - demo
Serialization
Monitoring and alerting
Monitoring and alerting - demo
Data validation
Data validation - demo
Homework
Week 5 - Stateful processing
Introduction
Problem statement
Stateful processing - introduction
Stateful processing - window
Stateful processing - arbitrary stateful processing
Stateful processing - state store implementations
Stateful processing - stateful logic
Stateful processing - window demo
Stateful processing - arbitrary stateful processing demo
Shuffle
Late data
Scalability
Elasticity
Fault-tolerance
Idempotency
Idempotency - demo
Reprocessing
Complex Event Processing - theory
Complex Event Processing - framework
Complex Event Processing - demo
Messaging patterns
Debugging - tips
Homework
Week 6 - ETL
Introduction
Problem statement
Data pipeline steps
Staging area
ETL vs ELT
Patterns
Orchestration framework
Alerting
Idempotency
Idempotency and small data
Idempotency and append data
Idempotency and append data - demo
Idempotency and immutable data (versioning)
Idempotency and immutable data (versioning) - demo
Data reprocessing
Triggers
Task examples
Best practices
Data lineage
Data lineage - demo
Homework
Week 7 - Analytics
Introduction
Problem statement
SQL
JOINs
Execution plans
Approximate algorithms
Data warehousing
Columnar format vs row format
Encoding
Data modeling - normalized data
Data modeling - dimensional models
Data modeling - data vault
Data modeling - denormalization
Data modeling - pipeline integration
Real-time SQL -Structured Streaming
Real-time SQL - Kafka SQL
Data security
Data security - credentials and permissions demo
Data security - data encryption demo
Data security - data versioning demo
Homework
Week 8 - Data visualization
Introduction
Problem statement
Visualization types
Data exploration
Data exploration - Jupyter example
Data visualization - JavaScript frameworks
Data visualization - Python frameworks
Data visualization - Reporting tools
Reporting tools - intermediate storage
Data catalog
Data mart
Data visualization and batch processing
Data visualization and streaming processing
Best practices
Homework
Week 9 - Data exposition - REST API
Introduction
Problem statement
Polyglot persistence
Asynchronous communication - theory and Scala example
Asynchronous communication - Python example
Compression
Bulk operations
Data mutation
Window-based processing
Time-series
Time-series - demo
RESTful web services
RESTful web services - demo
Homework
Week 10 - Machine Learning
Introduction
Problem statement
Main concepts
ML workflow
Compute environment
ML workflow - Notebook demo
ML workflow - automation demo (ETL)
Online learning
Online learning - demo
Model quality
Serving layer
Rendezvous architecture
ML engineer
ML workflow platform
ML workflow platform - demo
Homework
Week 11 - Going further
Introduction
Problem statement
Cloud computing - why
Cloud computing - introduction
Cloud computing - data services typology
AWS cloud data services
GCP cloud data services
Azure cloud data services
Docker
Kubernetes
Software engineering best practices - Scala example
Software engineering best practices - Python example
Tests and data processing - demo
DevOps - introduction
DevOps - components
DevOps - data example
DevOps - Github Actions demo
Data processing frameworks - going distributed
Data processing frameworks - going distributed going distributed Hadoop YARN demo
Data processing frameworks - going distributed Kubernetes demo
Data processing frameworks - going distributed - tips
Serverless
Not covered frameworks and libs
Homework
Week 12 - Summary
Introduction
Data processing
Data stores
Data systems
Data concepts
Data engineering tasks
Exercises
Resources
See you!
Homework
Homework exercises - examples of solutions
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Teach online with
Software engineering best practices - Python example
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock