📖 Documentation

Ressources et documentation officielle par niveau

Astuce📌 Organisation par niveau

Cette page liste les documentations officielles organisées par niveau du bootcamp. Consultez-les au fur et à mesure de votre progression.


🟦 Niveau 1 : Débutant

🐍 Python & Data Processing

Technologie Documentation Module associé
Python docs.python.org 04 - Python Fondamental
Pandas pandas.pydata.org/docs 05 - Python Data Processing
NumPy numpy.org/doc 05 - Python Data Processing
BeautifulSoup beautiful-soup-4.readthedocs.io 🎮 Projet Débutant

🗄️ Bases de Données & SQL

Technologie Documentation Module associé
PostgreSQL postgresql.org/docs 06 - Intro BDD
SQL Tutorial w3schools.com/sql 07 - SQL
DuckDB duckdb.org/docs 🎮 Projet Débutant
MongoDB mongodb.com/docs 09 - MongoDB
Elasticsearch elastic.co/guide 10 - Elasticsearch

🔧 Outils Fondamentaux

Technologie Documentation Module associé
Git git-scm.com/doc 03 - Git & Versioning
Bash gnu.org/software/bash/manual 02 - Linux & Bash
FastAPI fastapi.tiangolo.com 13 - FastAPI
Streamlit docs.streamlit.io 🎮 Projet Débutant

⚡ Introduction Spark

Ressource Lien Module associé
PySpark Quickstart spark.apache.org/docs/latest/api/python/getting_started 11 - Intro PySpark
DataFrame Guide spark.apache.org/docs/latest/sql-programming-guide 11 - Intro PySpark

🟩 Niveau 2 : Intermédiaire

🐳 Containers & Kubernetes

Technologie Documentation Cheatsheet Module associé
Docker docs.docker.com Cheatsheet 14 - Docker
Kubernetes kubernetes.io/docs kubectl Cheatsheet 15 - K8s Fundamentals
Spark Operator github.com/kubeflow/spark-operator - 21 - Spark on K8s

🚀 High Performance Python

Technologie Documentation Module associé
Polars docs.pola.rs 17 - Polars
PyArrow arrow.apache.org/docs/python 17 - Polars

⚡ Apache Spark Avancé

Ressource Lien Module associé
Spark Documentation spark.apache.org/docs/latest 19 - PySpark Avancé
PySpark API Reference spark.apache.org/docs/latest/api/python 19 - PySpark Avancé
Spark SQL Guide spark.apache.org/docs/latest/sql-ref 20 - Spark SQL Deep Dive
Performance Tuning spark.apache.org/docs/latest/sql-performance-tuning 20 - Spark SQL Deep Dive

🏠 Lakehouse & Table Formats

Format Documentation GitHub Module associé
Delta Lake docs.delta.io delta-io/delta 23 - Table Formats
Apache Iceberg iceberg.apache.org/docs apache/iceberg 23 - Table Formats
Apache Hudi hudi.apache.org/docs apache/hudi -

📨 Streaming & Messaging

Technologie Documentation Module associé
Apache Kafka kafka.apache.org/documentation 24 - Kafka & Streaming
Confluent Platform docs.confluent.io 24 - Kafka & Streaming
kafka-python kafka-python.readthedocs.io 24 - Kafka & Streaming

☁️ Cloud Object Storage

Provider Documentation Module associé
AWS S3 docs.aws.amazon.com/s3 22 - Cloud Storage
GCP GCS cloud.google.com/storage/docs 22 - Cloud Storage
MinIO min.io/docs 22 - Cloud Storage

🔧 Data Quality & Transformation

Technologie Documentation Module associé
dbt docs.getdbt.com 25 - dbt & Data Quality
Great Expectations docs.greatexpectations.io 25 - dbt & Data Quality

🟥 Niveau 3 : Avancé

☸️ Infrastructure Avancée

Technologie Documentation Module associé
Helm helm.sh/docs 27 - K8s Deep Dive
ArgoCD argo-cd.readthedocs.io 27 - K8s Deep Dive
Kubernetes Operators kubernetes.io/docs/concepts/extend-kubernetes/operator 27 - K8s Deep Dive

🔄 Orchestration Avancée

Technologie Documentation Module associé
Apache Airflow airflow.apache.org/docs 28 - Orchestration Avancée
Dagster docs.dagster.io 28 - Orchestration Avancée
Prefect docs.prefect.io -

📨 Messaging Distribué

Technologie Documentation Module associé
Apache Pulsar pulsar.apache.org/docs 29 - Distributed Messaging
RabbitMQ rabbitmq.com/documentation 29 - Distributed Messaging
Debezium debezium.io/documentation 29 - Distributed Messaging

⚡ Spark & Scala

Ressource Lien Module associé
Scala Documentation docs.scala-lang.org 30 - Spark & Scala
Spark Internals spark.apache.org/docs/latest/rdd-programming-guide 30 - Spark & Scala
Catalyst Optimizer databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer 30 - Spark & Scala

🤖 MLOps & Feature Engineering

Technologie Documentation Module associé
MLflow mlflow.org/docs/latest 31 - DE for ML
Feast docs.feast.dev 31 - DE for ML
Tecton docs.tecton.ai 31 - DE for ML

🏛️ Data Mesh & Governance

Ressource Lien Module associé
Data Mesh Principles martinfowler.com/articles/data-mesh-principles 32 - Data Mesh
Data Contracts datacontract.com 32 - Data Mesh
DataHub datahubproject.io/docs 32 - Data Mesh
OpenMetadata docs.open-metadata.org 32 - Data Mesh

📊 OLAP & Real-time Analytics

Technologie Documentation Module associé
ClickHouse clickhouse.com/docs 33 - Realtime OLAP
Apache Druid druid.apache.org/docs/latest 33 - Realtime OLAP
Apache Pinot docs.pinot.apache.org 33 - Realtime OLAP

📊 Monitoring & Observability

Outil Documentation Usage
Prometheus prometheus.io/docs Métriques
Grafana grafana.com/docs Dashboards
OpenTelemetry opentelemetry.io/docs Tracing

📚 Ressources Transversales

📖 Livres Essentiels

Astuce📌 Top 3 à lire absolument
  1. Designing Data-Intensive Applications — Martin Kleppmann
  2. Fundamentals of Data Engineering — Joe Reis & Matt Housley
  3. Learning Spark, 2nd Edition — Damji et al.
Livre Auteur Niveau Sujet
Designing Data-Intensive Applications Martin Kleppmann 🟩🟥 Architecture distribuée
Fundamentals of Data Engineering Joe Reis & Matt Housley 🟦🟩 Vue d’ensemble DE
Learning Spark, 2nd Edition Damji et al. 🟦🟩 PySpark
Spark: The Definitive Guide Chambers & Zaharia 🟩🟥 Spark avancé
Data Pipelines Pocket Reference James Densmore 🟦🟩 Pipelines
Streaming Systems Akidau et al. 🟥 Streaming avancé
The Data Warehouse Toolkit Ralph Kimball 🟩🟥 Modélisation dimensionnelle
Building Microservices Sam Newman 🟥 Architecture

🎓 Formations Gratuites

Plateforme Cours Niveau
dbt Learn courses.getdbt.com 🟩
Databricks Academy databricks.com/learn 🟦🟩🟥
Confluent Developer developer.confluent.io 🟩🟥
Data Talks Club datatalks.club 🟦🟩
Coursera - GCP DE coursera.org/…gcp-data-engineering 🟩

📝 Blogs & Newsletters

Ressource Focus Fréquence
Databricks Blog Spark, Delta Lake, Lakehouse Hebdo
Confluent Blog Kafka, Streaming Hebdo
dbt Blog Analytics Engineering Hebdo
DuckDB Blog Analytique in-process Mensuel
Data Engineering Weekly Newsletter DE Hebdo
Seattle Data Guy Carrière, tutoriels Hebdo
Start Data Engineering Tutoriels pratiques Mensuel

🛠️ IDEs & Outils

Outil Usage Lien Gratuit
VS Code IDE polyvalent code.visualstudio.com
PyCharm Python IDE jetbrains.com/pycharm ✅ Community
DataGrip SQL IDE jetbrains.com/datagrip 30j trial
DBeaver SQL gratuit dbeaver.io
Postman API testing postman.com
k9s Terminal UI K8s k9scli.io

🔗 Communautés

Communauté Plateforme Focus
r/dataengineering Reddit Discussions générales
dbt Community Slack dbt, Analytics Engineering
Data Talks Club Slack Cours, networking
Apache Slack Slack Projets Apache
Locally Optimistic Slack Data leaders

🏠 Retour à l’accueil

Retour au sommet