🔗 Liens Utiles
Ressources externes, outils et plateformes recommandés
Astuce💡 Conseil
Bookmark cette page ! Elle regroupe tous les liens utiles pour votre parcours Data Engineering.
🛠️ Outils de Développement
IDEs & Éditeurs
| Outil | Usage | Gratuit | Lien |
|---|---|---|---|
| VS Code | IDE polyvalent (recommandé) | ✅ | code.visualstudio.com |
| PyCharm | IDE Python | ✅ Community | jetbrains.com/pycharm |
| IntelliJ IDEA | IDE Scala/Java | ✅ Community | jetbrains.com/idea |
| DataGrip | IDE SQL puissant | 30j trial | jetbrains.com/datagrip |
| DBeaver | Client SQL universel | ✅ | dbeaver.io |
| Postman | Test d’API REST | ✅ | postman.com |
Bases de Données Analytiques
| Outil | Description | Gratuit | Lien |
|---|---|---|---|
| DuckDB | Base SQL analytique in-process | ✅ | duckdb.org |
| SQLite | Base SQL embarquée | ✅ | sqlite.org |
| ClickHouse | OLAP temps réel | ✅ | clickhouse.com |
Dashboards & Visualisation
| Outil | Description | Gratuit | Lien |
|---|---|---|---|
| Streamlit | Dashboards Python (recommandé) | ✅ | streamlit.io |
| Gradio | UI ML rapide | ✅ | gradio.app |
| Apache Superset | BI open source | ✅ | superset.apache.org |
| Metabase | BI simple | ✅ | metabase.com |
| Plotly Dash | Dashboards interactifs | ✅ | dash.plotly.com |
Terminal & Shell
| Outil | Description | OS | Lien |
|---|---|---|---|
| iTerm2 | Terminal avancé | macOS | iterm2.com |
| Windows Terminal | Terminal moderne | Windows | Microsoft Store |
| Oh My Zsh | Framework Zsh | All | ohmyz.sh |
| Starship | Prompt rapide et beau | All | starship.rs |
| tmux | Multiplexeur sessions | All | github.com/tmux |
| fzf | Fuzzy finder | All | github.com/junegunn/fzf |
Git & Collaboration
| Outil | Description | Lien |
|---|---|---|
| GitHub | Hébergement Git + CI/CD | github.com |
| GitLab | Alternative self-hosted | gitlab.com |
| GitKraken | Client Git visuel | gitkraken.com |
| Sourcetree | Client Git Atlassian | sourcetreeapp.com |
| lazygit | Git TUI rapide | github.com/jesseduffield/lazygit |
📊 Datasets pour Pratiquer
Datasets du Bootcamp
| Dataset | Utilisé dans | Lien |
|---|---|---|
| Video Game Sales | 🎮 Projet Débutant | kaggle.com/rush4ratio |
| Olist E-commerce | 📦 Projet Intermédiaire | kaggle.com/olistbr |
Datasets Populaires
| Dataset | Description | Taille | Lien |
|---|---|---|---|
| NYC Taxi | Trajets taxis New York | ~100 Go | nyc.gov/tlc |
| Spotify Dataset | Tracks & features audio | ~1 Go | kaggle.com/spotify |
| Stack Overflow | Questions/Réponses dev | ~50 Go | archive.org/stackoverflow |
| Wikipedia | Dumps complets | ~100 Go | dumps.wikimedia.org |
| Common Crawl | Web crawl | Petabytes | commoncrawl.org |
Open Data
| Source | Description | Lien |
|---|---|---|
| data.gouv.fr | Open Data France | data.gouv.fr |
| data.europa.eu | Open Data EU | data.europa.eu |
| NYC Open Data | Données New York | opendata.cityofnewyork.us |
| World Bank | Données économiques mondiales | data.worldbank.org |
| Awesome Public Datasets | Liste curatée GitHub | github.com/awesomedata |
☁️ Cloud & Platforms
Cloud Providers
| Provider | Console | Free Tier | Services Data |
|---|---|---|---|
| AWS | console.aws.amazon.com | 12 mois | S3, EMR, Glue, Redshift, Kinesis |
| Google Cloud | console.cloud.google.com | $300 crédits | GCS, Dataproc, BigQuery, Pub/Sub |
| Azure | portal.azure.com | $200 crédits | Blob, Synapse, Data Factory |
Data Platforms
| Plateforme | Type | Free Tier | Lien |
|---|---|---|---|
| Databricks | Lakehouse | Community Edition ✅ | databricks.com |
| Snowflake | Data Warehouse | 30j trial | snowflake.com |
| Confluent Cloud | Kafka Managed | $400 crédits | confluent.cloud |
| dbt Cloud | Transformation | Free Developer ✅ | getdbt.com |
| Fivetran | Ingestion | 14j trial | fivetran.com |
| Airbyte | Ingestion Open Source | ✅ Self-hosted | airbyte.com |
Registries & Hubs
| Outil | Description | Lien |
|---|---|---|
| Docker Hub | Images Docker | hub.docker.com |
| Artifact Hub | Charts Helm | artifacthub.io |
| OperatorHub | Operators Kubernetes | operatorhub.io |
| PyPI | Packages Python | pypi.org |
🧪 Playgrounds & Labs
SQL
| Outil | Engines | Lien |
|---|---|---|
| DB Fiddle | PostgreSQL, MySQL, SQLite | db-fiddle.com |
| SQL Fiddle | Multi-DB | sqlfiddle.com |
| DuckDB Shell | DuckDB en ligne | shell.duckdb.org |
| Mode SQL Tutorial | Cours interactif | mode.com/sql-tutorial |
| SQLZoo | Tutoriels progressifs | sqlzoo.net |
Python & Notebooks
| Outil | Description | Lien |
|---|---|---|
| Google Colab | Notebooks gratuits + GPU | colab.research.google.com |
| Databricks Community | Spark gratuit | community.cloud.databricks.com |
| Kaggle Notebooks | Datasets + compute | kaggle.com/code |
| Deepnote | Notebooks collaboratifs | deepnote.com |
| Replit | IDE en ligne | replit.com |
Kubernetes
| Outil | Description | Lien |
|---|---|---|
| Killercoda | Labs K8s interactifs | killercoda.com |
| Play with Kubernetes | Cluster éphémère | labs.play-with-k8s.com |
| Play with Docker | Docker en ligne | labs.play-with-docker.com |
📚 Apprentissage
Formations Gratuites
| Plateforme | Contenu | Certification | Lien |
|---|---|---|---|
| dbt Learn | dbt fundamentals | ✅ Gratuit | courses.getdbt.com |
| Databricks Academy | Spark, Delta Lake | ✅ Certaines | databricks.com/learn |
| Confluent Developer | Kafka, streaming | ✅ Certaines | developer.confluent.io |
| Data Talks Club | DE Zoomcamp | ✅ Gratuit | datatalks.club |
| freeCodeCamp | Data Analysis Python | ✅ Gratuit | freecodecamp.org |
Formations Payantes
| Plateforme | Focus | Lien |
|---|---|---|
| Coursera | Certifications Cloud | coursera.org |
| DataCamp | Data skills interactif | datacamp.com |
| Udemy | Cours variés | udemy.com |
| Pluralsight | Tech enterprise | pluralsight.com |
| O’Reilly | Livres + vidéos | oreilly.com |
YouTube Channels
| Channel | Focus | Lien |
|---|---|---|
| Databricks | Spark, Delta Lake, Lakehouse | youtube.com/@Databricks |
| Confluent | Kafka, Event Streaming | youtube.com/@Confluent |
| Seattle Data Guy | Carrière DE, tutoriels | youtube.com/@SeattleDataGuy |
| Data with Zach | DE concepts | youtube.com/@datawithzach |
| Andreas Kretz | Architectures Data | youtube.com/@andreaskayy |
| TechWorld with Nana | DevOps, K8s | youtube.com/@TechWorldwithNana |
👥 Communautés
Forums & Slack
| Communauté | Plateforme | Lien |
|---|---|---|
| r/dataengineering | reddit.com/r/dataengineering | |
| dbt Community | Slack (~50k) | community.getdbt.com |
| Data Talks Club | Slack (~40k) | datatalks.club/slack |
| Apache Slack | Slack | the-asf.slack.com |
| MLOps Community | Slack | mlops.community |
| Locally Optimistic | Slack | locallyoptimistic.com |
Conférences
| Event | Focus | Lien |
|---|---|---|
| Data + AI Summit | Databricks, Spark | databricks.com/dataaisummit |
| Kafka Summit | Streaming | kafka-summit.org |
| dbt Coalesce | Analytics Engineering | coalesce.getdbt.com |
| Data Council | Data Engineering | datacouncil.ai |
| QCon | Software Architecture | qconferences.com |
🎯 Préparation Entretiens
Coding Practice
| Site | Focus | Niveau | Lien |
|---|---|---|---|
| LeetCode | Algo + SQL | All | leetcode.com |
| DataLemur | SQL pour DE | 🟦🟩 | datalemur.com |
| StrataScratch | SQL réaliste | 🟩🟥 | stratascratch.com |
| HackerRank | Challenges variés | All | hackerrank.com |
| SQLPad | SQL interactif | 🟦 | sqlpad.io |
System Design
| Ressource | Type | Lien |
|---|---|---|
| System Design Primer | GitHub gratuit | github.com/donnemartin |
| ByteByteGo | Newsletter + livre | bytebytego.com |
| Grokking System Design | Cours payant | educative.io |
| Designing Data-Intensive Apps | Livre | O’Reilly |
Ressources Carrière
| Ressource | Type | Lien |
|---|---|---|
| levels.fyi | Salaires tech | levels.fyi |
| Glassdoor | Reviews entreprises | glassdoor.com |
| Blind | Discussions anonymes | teamblind.com |
🔖 Quick Bookmarks
Astuce📌 Les liens essentiels à bookmarker
📁 Data Engineering Bookmarks/
│
├── 📁 Documentation/
│ ├── spark.apache.org/docs
│ ├── kafka.apache.org/documentation
│ ├── docs.delta.io
│ ├── docs.getdbt.com
│ ├── duckdb.org/docs
│ └── kubernetes.io/docs
│
├── 📁 Playgrounds/
│ ├── community.cloud.databricks.com (Spark gratuit)
│ ├── shell.duckdb.org (DuckDB en ligne)
│ ├── db-fiddle.com (SQL)
│ └── killercoda.com (K8s)
│
├── 📁 Apprentissage/
│ ├── courses.getdbt.com (dbt gratuit)
│ ├── developer.confluent.io (Kafka)
│ └── datatalks.club (DE Zoomcamp)
│
├── 📁 Communautés/
│ ├── reddit.com/r/dataengineering
│ ├── community.getdbt.com (Slack)
│ └── datatalks.club/slack
│
└── 📁 Entretiens/
├── datalemur.com (SQL)
├── leetcode.com
└── github.com/donnemartin/system-design-primer