Bootcamp Data Engineering – From Zero to Hero

🚀 Bootcamp Data Engineering

From Zero to Hero

Devenez un Data Engineer opérationnel, prêt pour l’entreprise.

🎯 Pourquoi choisir ce Bootcamp ?

Ce programme a été conçu par des Data Engineers expérimentés, en se concentrant sur les exigences réelles du marché.
Il vous accompagne de manière progressive vers un niveau professionnel – Senior Ready.

✔ Ce que vous allez maîtriser

Outils industriels : Spark 3.x, Kafka, Flink, Docker, Kubernetes, dbt, Lakehouse (Delta/Iceberg).
Architectures modernes : Médaillon (Bronze/Silver/Gold), Kappa, Lambda, Data Lakehouse, Data Mesh.
Compétences d’ingénierie : optimisation, gouvernance, orchestration, CI/CD, cloud.
Industrialisation : création de pipelines batch et streaming totalement opérationnels.
Leadership technique : RFC, Design Reviews, ADR, pensée architecturale.

👥 À qui s’adresse ce Bootcamp ?

Étudiants / Développeurs voulant entrer dans le monde du Big Data.
Analystes BI souhaitant évoluer vers le Data Engineering moderne.
Professionnels expérimentés voulant structurer une plateforme Data complète.
Architectes & Managers cherchant à comprendre les technos Data modernes.

🛠️ Votre Boîte à Outils Data Engineering

Domaine	Outils & Technologies
Big Data Processing	PySpark, Spark SQL, Scala, Polars, Pandas
Lakehouse	Delta Lake, Apache Iceberg, Hudi
Streaming	Kafka, Spark Streaming, Flink, Debezium CDC
DevOps / Infra	Docker, Kubernetes, Helm, ArgoCD, Spark Operator
Monitoring	Prometheus, Grafana, Alertmanager
Cloud	AWS S3, GCP GCS, Azure Blob, IAM, MinIO
Qualité & ETL	dbt, Great Expectations, Data Contracts
MLOps	MLflow, Feast (Feature Store), Model Monitoring
Governance	DataHub, Unity Catalog, RLS/CLS, GDPR

📘 Structure Complète du Programme (3 Niveaux)

La progression est organisée selon un parcours pédagogique optimal :

🟦 Niveau 1 : Débutant – Fondations & Premiers Pipelines

🎯 Objectif

Construire des bases solides en Python, SQL, systèmes distribués et premiers pipelines.

📚 Modules

#	Module	Thèmes Clés
01	Introduction au Data Engineering	Rôle du DE, écosystème, architectures (Lambda, Kappa, Lakehouse)
02	Fondamentaux Linux & Bash	Commandes essentielles, scripting, cron, permissions
03	Git & Versioning	Branches, merge, rebase, workflows collaboratifs
04	Python Fondamental	Syntaxe, structures, fichiers, exceptions
05	Python Data Processing	POO, décorateurs, générateurs, context managers
06	Introduction BDD Relationnelles	Concepts SGBD, modélisation, normalisation
07	SQL pour Data Engineers	Requêtes, jointures, window functions, optimisation
08	Introduction Big Data	Hadoop, HDFS, MapReduce, écosystème distribué
09	MongoDB pour Data Engineers	NoSQL, CRUD, agrégations, indexation
10	Elasticsearch	Recherche full-text, indexation, requêtes DSL
11	Introduction PySpark	RDD, DataFrame, transformations, actions
12	Orchestration de Pipelines	Concepts, scheduling, dépendances
13	Bonus : FastAPI	API REST pour exposer vos données

🎮 Projet Intégrateur : Video Games Analytics Platform

Pipeline complet : Kaggle CSV → Web Scraping → DuckDB + Elasticsearch → PySpark → FastAPI → Streamlit Dashboard

🚀 Accéder au projet

🎯 Résultat : Capable de construire un pipeline data de bout en bout, de l’ingestion au dashboard.

🟩 Niveau 2 : Intermédiaire – Industrialisation & Lakehouse

🎯 Objectif

Maîtriser les technologies indispensables en entreprise : Docker, Kubernetes, Spark avancé, Lakehouse, streaming, orchestration.

📚 Modules

#	Module	Thèmes Clés
14	Docker pour Data Engineers	Images, containers, volumes, Dockerfile, Compose
15	Kubernetes Fondamentaux	Pods, Deployments, Services, ConfigMaps, Secrets
16	K8s pour Data Workloads	StatefulSets, Jobs, CronJobs, volumes persistants
17	Polars pour Data Engineering	API Polars, lazy evaluation, vs Pandas/Spark
18	High Performance Python	Profiling, optimisation, multiprocessing, async
19	PySpark Avancé	Partitioning, caching, broadcast, UDF, optimisation
20	Spark SQL Deep Dive	Catalyst, plans d’exécution, tuning, adaptive query
21	Spark on Kubernetes	Spark Operator, SparkApplication, scaling, monitoring
22	Cloud Object Storage	S3, GCS, Azure Blob, MinIO, IAM, performances
23	Table Formats (Delta, Iceberg)	ACID, Time Travel, Schema Evolution, MERGE INTO
24	Kafka & Streaming	Producers, Consumers, Topics, Partitions, Consumer Groups
25	dbt & Data Quality	Models, Tests, Documentation, Great Expectations

📦 Projet Intégrateur : Pipeline E-commerce Olist

Pipeline Lakehouse : Kafka → Spark Streaming → Delta Lake → dbt → Dashboard

🚀 Accéder au projet

🎯 Résultat : Déployer un job Spark sur Kubernetes, stocker en Delta, orchestrer & monitorer.

🟥 Niveau 3 : Avancé – Architecture, Optimisation & Seniority

🎯 Objectif

Atteindre le niveau “Senior Data Engineer / Architecte Data” avec une maîtrise des systèmes distribués, de l’architecture et du leadership technique.

📚 Modules

#	Module	Thèmes Clés
27	Kubernetes Deep Dive	Operators, CRDs, Helm avancé, GitOps, troubleshooting
28	Orchestration Avancée	Airflow 2.x, DAGs dynamiques, KubernetesPodOperator, Dagster
29	Messaging Distribué	Kafka internals, Pulsar, RabbitMQ, patterns de messaging
30	Spark & Scala Deep Dive	Internals Spark, Catalyst, Tungsten, optimisation bas niveau
31	Data Engineering pour le ML	Feature Stores, pipelines ML, MLflow, model serving
32	Data Mesh & Contracts	Data Products, Domain Ownership, Data Contracts, APIs
33	Realtime OLAP & Dashboards	ClickHouse, Apache Druid, Pinot, dashboards temps réel
34	Patterns & Décisions d’Architecture	ADR, RFC, trade-offs, design reviews, documentation
35	Leadership & Trade-offs	Communication technique, mentoring, gestion de projet tech

🎯 Résultat : Capable de concevoir et défendre une architecture Data complète, mener des design reviews, et guider une équipe technique.

📊 Vue d’Ensemble du Parcours

┌─────────────────────────────────────────────────────────────────────────────┐
│                           PARCOURS COMPLET                                  │
│                                                                             │
│   🟦 NIVEAU 1 : DÉBUTANT                                                    │
│   ════════════════════════                                                  │
│   Modules 01-13                                                             │
│   Python → SQL → Big Data → PySpark → MongoDB → Elasticsearch              │
│   🎮 Projet : Video Games Analytics Platform                                │
│                          │                                                  │
│                          ▼                                                  │
│   🟩 NIVEAU 2 : INTERMÉDIAIRE                                               │
│   ════════════════════════════                                              │
│   Modules 14-25                                                             │
│   Docker → K8s → Spark avancé → Lakehouse → Streaming → dbt                 │
│   📦 Projet : Pipeline E-commerce Olist                                     │
│                          │                                                  │
│                          ▼                                                  │
│   🟥 NIVEAU 3 : AVANCÉ                                                      │
│   ═════════════════════                                                     │
│   Modules 27-35                                                             │
│   K8s Deep Dive → Orchestration → Messaging → Spark/Scala → ML             │
│   Architecture → Data Mesh → OLAP → Leadership                              │
│                                                                             │
│                          ▼                                                  │
│                    🏆 SENIOR READY                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

🏆 Compétences Acquises par Niveau

Compétence	Débutant	Intermédiaire	Avancé
Linux & Bash	✅✅	✅✅✅	✅✅✅
Python & SQL	✅	✅✅	✅✅✅
Git & Versioning	✅✅	✅✅	✅✅
PySpark	✅	✅✅	✅✅✅
Spark Scala	-	-	✅✅
Docker & Kubernetes	-	✅✅	✅✅✅
Lakehouse (Delta/Iceberg)	-	✅✅	✅✅✅
Streaming (Kafka)	-	✅✅	✅✅✅
Orchestration (Airflow)	-	✅	✅✅✅
dbt & Data Quality	-	✅✅	✅✅
OLAP & Realtime Analytics	-	-	✅✅
MLOps & Feature Stores	-	-	✅✅
Data Mesh & Contracts	-	-	✅✅✅
Governance & Security	-	✅	✅✅✅
Architecture & Design	-	✅	✅✅✅
Leadership technique	-	-	✅✅✅

🚀 Démarrer le Bootcamp

👉 Cliquez dans le menu de gauche pour ouvrir le premier module, ou choisissez votre point d’entrée :

🟦 Niveau Débutant 🟩 Niveau Intermédiaire 🟥 Niveau Avancé

💡 Conseils pour Réussir

Suivez l’ordre : Les modules sont conçus pour être suivis séquentiellement.
Pratiquez : Faites tous les exercices et projets intégrateurs.
Expérimentez : Modifiez le code, cassez des choses, apprenez des erreurs.
Documentez : Prenez des notes, créez votre propre documentation.
Construisez votre portfolio : Les projets intégrateurs sont présentables en entretien.

🎉 Bonne montée en compétence !

Vous êtes maintenant prêt à progresser étape par étape jusqu’au niveau Senior Data Engineer.

Retour au sommet