📖 Documentation
Ressources et documentation officielle par niveau
Astuce📌 Organisation par niveau
Cette page liste les documentations officielles organisées par niveau du bootcamp. Consultez-les au fur et à mesure de votre progression.
🟦 Niveau 1 : Débutant
🐍 Python & Data Processing
| Technologie | Documentation | Module associé |
|---|---|---|
| Python | docs.python.org | 04 - Python Fondamental |
| Pandas | pandas.pydata.org/docs | 05 - Python Data Processing |
| NumPy | numpy.org/doc | 05 - Python Data Processing |
| BeautifulSoup | beautiful-soup-4.readthedocs.io | 🎮 Projet Débutant |
🗄️ Bases de Données & SQL
| Technologie | Documentation | Module associé |
|---|---|---|
| PostgreSQL | postgresql.org/docs | 06 - Intro BDD |
| SQL Tutorial | w3schools.com/sql | 07 - SQL |
| DuckDB | duckdb.org/docs | 🎮 Projet Débutant |
| MongoDB | mongodb.com/docs | 09 - MongoDB |
| Elasticsearch | elastic.co/guide | 10 - Elasticsearch |
🔧 Outils Fondamentaux
| Technologie | Documentation | Module associé |
|---|---|---|
| Git | git-scm.com/doc | 03 - Git & Versioning |
| Bash | gnu.org/software/bash/manual | 02 - Linux & Bash |
| FastAPI | fastapi.tiangolo.com | 13 - FastAPI |
| Streamlit | docs.streamlit.io | 🎮 Projet Débutant |
⚡ Introduction Spark
| Ressource | Lien | Module associé |
|---|---|---|
| PySpark Quickstart | spark.apache.org/docs/latest/api/python/getting_started | 11 - Intro PySpark |
| DataFrame Guide | spark.apache.org/docs/latest/sql-programming-guide | 11 - Intro PySpark |
🟩 Niveau 2 : Intermédiaire
🐳 Containers & Kubernetes
| Technologie | Documentation | Cheatsheet | Module associé |
|---|---|---|---|
| Docker | docs.docker.com | Cheatsheet | 14 - Docker |
| Kubernetes | kubernetes.io/docs | kubectl Cheatsheet | 15 - K8s Fundamentals |
| Spark Operator | github.com/kubeflow/spark-operator | - | 21 - Spark on K8s |
🚀 High Performance Python
| Technologie | Documentation | Module associé |
|---|---|---|
| Polars | docs.pola.rs | 17 - Polars |
| PyArrow | arrow.apache.org/docs/python | 17 - Polars |
⚡ Apache Spark Avancé
| Ressource | Lien | Module associé |
|---|---|---|
| Spark Documentation | spark.apache.org/docs/latest | 19 - PySpark Avancé |
| PySpark API Reference | spark.apache.org/docs/latest/api/python | 19 - PySpark Avancé |
| Spark SQL Guide | spark.apache.org/docs/latest/sql-ref | 20 - Spark SQL Deep Dive |
| Performance Tuning | spark.apache.org/docs/latest/sql-performance-tuning | 20 - Spark SQL Deep Dive |
🏠 Lakehouse & Table Formats
| Format | Documentation | GitHub | Module associé |
|---|---|---|---|
| Delta Lake | docs.delta.io | delta-io/delta | 23 - Table Formats |
| Apache Iceberg | iceberg.apache.org/docs | apache/iceberg | 23 - Table Formats |
| Apache Hudi | hudi.apache.org/docs | apache/hudi | - |
📨 Streaming & Messaging
| Technologie | Documentation | Module associé |
|---|---|---|
| Apache Kafka | kafka.apache.org/documentation | 24 - Kafka & Streaming |
| Confluent Platform | docs.confluent.io | 24 - Kafka & Streaming |
| kafka-python | kafka-python.readthedocs.io | 24 - Kafka & Streaming |
☁️ Cloud Object Storage
| Provider | Documentation | Module associé |
|---|---|---|
| AWS S3 | docs.aws.amazon.com/s3 | 22 - Cloud Storage |
| GCP GCS | cloud.google.com/storage/docs | 22 - Cloud Storage |
| MinIO | min.io/docs | 22 - Cloud Storage |
🔧 Data Quality & Transformation
| Technologie | Documentation | Module associé |
|---|---|---|
| dbt | docs.getdbt.com | 25 - dbt & Data Quality |
| Great Expectations | docs.greatexpectations.io | 25 - dbt & Data Quality |
🟥 Niveau 3 : Avancé
☸️ Infrastructure Avancée
| Technologie | Documentation | Module associé |
|---|---|---|
| Helm | helm.sh/docs | 27 - K8s Deep Dive |
| ArgoCD | argo-cd.readthedocs.io | 27 - K8s Deep Dive |
| Kubernetes Operators | kubernetes.io/docs/concepts/extend-kubernetes/operator | 27 - K8s Deep Dive |
🔄 Orchestration Avancée
| Technologie | Documentation | Module associé |
|---|---|---|
| Apache Airflow | airflow.apache.org/docs | 28 - Orchestration Avancée |
| Dagster | docs.dagster.io | 28 - Orchestration Avancée |
| Prefect | docs.prefect.io | - |
📨 Messaging Distribué
| Technologie | Documentation | Module associé |
|---|---|---|
| Apache Pulsar | pulsar.apache.org/docs | 29 - Distributed Messaging |
| RabbitMQ | rabbitmq.com/documentation | 29 - Distributed Messaging |
| Debezium | debezium.io/documentation | 29 - Distributed Messaging |
⚡ Spark & Scala
| Ressource | Lien | Module associé |
|---|---|---|
| Scala Documentation | docs.scala-lang.org | 30 - Spark & Scala |
| Spark Internals | spark.apache.org/docs/latest/rdd-programming-guide | 30 - Spark & Scala |
| Catalyst Optimizer | databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer | 30 - Spark & Scala |
🤖 MLOps & Feature Engineering
| Technologie | Documentation | Module associé |
|---|---|---|
| MLflow | mlflow.org/docs/latest | 31 - DE for ML |
| Feast | docs.feast.dev | 31 - DE for ML |
| Tecton | docs.tecton.ai | 31 - DE for ML |
🏛️ Data Mesh & Governance
| Ressource | Lien | Module associé |
|---|---|---|
| Data Mesh Principles | martinfowler.com/articles/data-mesh-principles | 32 - Data Mesh |
| Data Contracts | datacontract.com | 32 - Data Mesh |
| DataHub | datahubproject.io/docs | 32 - Data Mesh |
| OpenMetadata | docs.open-metadata.org | 32 - Data Mesh |
📊 OLAP & Real-time Analytics
| Technologie | Documentation | Module associé |
|---|---|---|
| ClickHouse | clickhouse.com/docs | 33 - Realtime OLAP |
| Apache Druid | druid.apache.org/docs/latest | 33 - Realtime OLAP |
| Apache Pinot | docs.pinot.apache.org | 33 - Realtime OLAP |
📊 Monitoring & Observability
| Outil | Documentation | Usage |
|---|---|---|
| Prometheus | prometheus.io/docs | Métriques |
| Grafana | grafana.com/docs | Dashboards |
| OpenTelemetry | opentelemetry.io/docs | Tracing |
📚 Ressources Transversales
📖 Livres Essentiels
Astuce📌 Top 3 à lire absolument
- Designing Data-Intensive Applications — Martin Kleppmann
- Fundamentals of Data Engineering — Joe Reis & Matt Housley
- Learning Spark, 2nd Edition — Damji et al.
| Livre | Auteur | Niveau | Sujet |
|---|---|---|---|
| Designing Data-Intensive Applications | Martin Kleppmann | 🟩🟥 | Architecture distribuée |
| Fundamentals of Data Engineering | Joe Reis & Matt Housley | 🟦🟩 | Vue d’ensemble DE |
| Learning Spark, 2nd Edition | Damji et al. | 🟦🟩 | PySpark |
| Spark: The Definitive Guide | Chambers & Zaharia | 🟩🟥 | Spark avancé |
| Data Pipelines Pocket Reference | James Densmore | 🟦🟩 | Pipelines |
| Streaming Systems | Akidau et al. | 🟥 | Streaming avancé |
| The Data Warehouse Toolkit | Ralph Kimball | 🟩🟥 | Modélisation dimensionnelle |
| Building Microservices | Sam Newman | 🟥 | Architecture |
🎓 Formations Gratuites
| Plateforme | Cours | Niveau |
|---|---|---|
| dbt Learn | courses.getdbt.com | 🟩 |
| Databricks Academy | databricks.com/learn | 🟦🟩🟥 |
| Confluent Developer | developer.confluent.io | 🟩🟥 |
| Data Talks Club | datatalks.club | 🟦🟩 |
| Coursera - GCP DE | coursera.org/…gcp-data-engineering | 🟩 |
🛠️ IDEs & Outils
| Outil | Usage | Lien | Gratuit |
|---|---|---|---|
| VS Code | IDE polyvalent | code.visualstudio.com | ✅ |
| PyCharm | Python IDE | jetbrains.com/pycharm | ✅ Community |
| DataGrip | SQL IDE | jetbrains.com/datagrip | 30j trial |
| DBeaver | SQL gratuit | dbeaver.io | ✅ |
| Postman | API testing | postman.com | ✅ |
| k9s | Terminal UI K8s | k9scli.io | ✅ |
🔗 Communautés
| Communauté | Plateforme | Focus |
|---|---|---|
| r/dataengineering | Discussions générales | |
| dbt Community | Slack | dbt, Analytics Engineering |
| Data Talks Club | Slack | Cours, networking |
| Apache Slack | Slack | Projets Apache |
| Locally Optimistic | Slack | Data leaders |