GStars
    DataTalksClub

    DataTalksClub/data-engineering-zoomcamp

    Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here πŸ‘‡πŸΌ

    education
    data-engineering
    devops
    course
    dbt
    docker
    free
    kafka
    kestra
    spark
    Jupyter Notebook
    38.1K stars
    7.7K forks
    38.1K watching
    Updated 3/5/2026
    View on GitHub
    Backblaze Advertisement

    Loading star history...

    Health Score

    75

    Weekly Growth

    +0

    +0.0% this week

    Contributors

    1

    Total contributors

    Open Issues

    7

    Generated Insights

    About data-engineering-zoomcamp

    Data Engineering Zoomcamp Overview

    Data Engineering Zoomcamp: A Free 9-Week Course on Data Engineering Fundamentals

    Master the fundamentals of data engineering by building an end-to-end data pipeline from scratch. Gain hands-on experience with industry-standard tools and best practices.

    Join Slack β€’ #course-data-engineering Channel β€’ Telegram Announcements β€’ Course Playlist β€’ FAQ

    How to Enroll

    2026 Cohort

    • Start Date: 12 January 2026
    • Register Here: Sign up

    Self-Paced Learning

    All course materials are freely available for independent study. Follow these steps:

    1. Watch the course videos.
    2. Join the Slack community.
    3. Refer to the FAQ document for guidance.

    Syllabus Overview

    The course consists of structured modules, hands-on workshops, and a final project to reinforce your learning.

    Prerequisites

    To get the most out of this course, you should have:

    • Basic coding experience
    • Familiarity with SQL
    • Experience with Python (helpful but not required)

    No prior data engineering experience is necessary.

    Modules

    Module 1: Containerization and Infrastructure as Code

    • Introduction to GCP
    • Docker and Docker Compose
    • Running PostgreSQL with Docker
    • Infrastructure setup with Terraform
    • Homework

    Module 2: Workflow Orchestration

    • Data Lakes and Workflow Orchestration
    • Workflow orchestration with Kestra
    • Homework

    Workshop 1: Data Ingestion

    • API reading and pipeline scalability
    • Data normalization and incremental loading
    • Homework

    Module 3: Data Warehousing

    • Introduction to BigQuery
    • Partitioning, clustering, and best practices
    • Machine learning in BigQuery

    Module 4: Analytics Engineering

    • dbt (data build tool) with DuckDB & BigQuery
    • Testing, documentation, and deployment
    • Data visualization with Streamlit & Looker Studio

    Module 5: Batch Processing

    • Introduction to Apache Spark
    • DataFrames and SQL
    • Internals of GroupBy and Joins

    Module 6: Streaming

    • Introduction to Kafka
    • Kafka Streams and KSQL
    • Schema management with Avro

    Final Project

    • Apply all concepts learned in a real-world scenario
    • Peer review and feedback process

    Testimonials

    Thank you for what you do! The Data Engineering Zoomcamp gave me skills that helped me land my first tech job.

    β€” Tim Claytor (Source)

    Three months might seem like a long time, but the growth and learning during this period are truly remarkable. It was a great experience with a lot of learning, connecting with like-minded people from all around the world, and having fun. I must admit, this was really hard. But the feeling of accomplishment and learning made it all worthwhile. And I would do it again!

    β€” Nevenka Lukic (Source)

    One of the significant things I inferred from the Zoomcamp is to prioritize fundamentals and principles over ever-evolving tools and tech stacks. Hugely grateful to Alexey Grigorev for putting together this incredible course and offering it for free.

    β€” Siddhartha Gogoi (Source)

    Such a fun deep dive into data engineering, cloud automation, and orchestration. I learned so much along the way. Big shoutout to Alexey Grigorev and the DataTalksClub team for the opportunity and guidance throughout the 3 months of the free course.

    β€” Assitan NIARE (Source)

    If you’re serious about breaking into data engineering, start here. The repo’s structure, community, and hands-on focus make it unparalleled.

    β€” Wady Osama (Source)

    Community & Support

    Getting Help on Slack

    Join the #course-data-engineering channel on DataTalks.Club Slack for discussions, troubleshooting, and networking.

    To keep discussions organized:

    Meet the Instructors

    Past instructors:

    Sponsors & Supporters

    A special thanks to our course sponsors for making this initiative possible!

    Interested in supporting our community? Reach out to [email protected].

    About DataTalks.Club

    DataTalks.Club

    DataTalks.Club is a global online community of data enthusiasts. It's a place to discuss data, learn, share knowledge, ask and answer questions, and support each other.

    Website β€’ Join Slack Community β€’ Newsletter β€’ Upcoming Events β€’ YouTube β€’ GitHub β€’ LinkedIn β€’ Twitter

    All the activity at DataTalks.Club mainly happens on Slack. We post updates there and discuss different aspects of data, career questions, and more.

    At DataTalksClub, we organize online events, community activities, and free courses. You can learn more about what we do at DataTalksClub Community Navigation.

    Discover Repositories

    Search across tracked repositories by name or description