[MULTI] Data Lakehouse Engineering With Apache Iceberg

jinkping5 · 24 Mai 2025

Design scalable, versioned, and ACID-compliant data lakehouse solutions using Apache Iceberg from the ground up.
What you'll learn
Gain a deep understanding of Apache Iceberg's architecture, its role in the modern data lakehouse ecosystem, and why it outperforms traditional table formats li
Learn how to create, manage, and query Iceberg tables using Python (PyIceberg), SQL interfaces, and metadata catalogs - with practical examples from real-world
Build high-performance batch and streaming data pipelines by integrating Iceberg with leading engines like Apache Spark, Apache Flink, Trino, and DuckDB.
Explore how to use cloud-native storage with AWS S3, and design scalable Iceberg tables that support large-scale, distributed analytics.
Apply performance tuning techniques such as file compaction, partition pruning, and metadata caching to optimize query speed and reduce compute costs.
Work with modern Python analytics tools like Polars and DuckDB for fast in-memory processing, enabling rapid exploration, testing, and data validation workflows
Requirements
Basic knowledge of Python, SQL, and data concepts is helpful, but no prior experience with Apache Iceberg or cloud tools is required.
Description
Welcome to Data Lakehouse Engineering with Apache Iceberg: From Basics to Best Practices - your complete guide to mastering the next generation of open table formats for analytics at scale.As the data world moves beyond traditional data lakes and expensive warehouses, Apache Iceberg is rapidly becoming the cornerstone of modern data architecture. Built for petabyte-scale datasets, Iceberg brings ACID transactions, schema evolution, time travel, partition pruning, and compatibility across multiple engines - all in an open, vendor-agnostic format.In this hands-on course, you'll go far beyond the basics. You'll build real-world data lakehouse pipelines using powerful tools like

yIceberg - programmatic access to Iceberg tables in PythonPolars - lightning-fast DataFrame library for in-memory transformationsDuckDB - local SQL powerhouse for interactive developmentApache Spark and Apache Flink - for large-scale batch and streaming processingTrino - query Iceberg with federated SQLAWS S3 - cloud-native object storage for Iceberg tablesAnd many more: SQL, Parquet, Glue, Athena, and modern open-source utilitiesWhat Makes This Course Special?Hands-on & Tool-rich: Not just Spark! Learn to use Iceberg with modern engines like Polars, DuckDB, Flink, and Trino.Cloud-Ready Architecture: Learn how to store and manage your Iceberg tables on AWS S3, enabling scalable and cost-effective deployments.Concepts + Practical Projects: Understand table formats, catalog management, schema evolution, and then apply them using real datasets.Open-source Focused: No vendor lock-in. You'll build interoperable pipelines using open, community-driven tools.What You'll Learn:The why and how of Apache Iceberg and its role in the data lakehouse ecosystemDesigning Iceberg tables with schema evolution, partitioning, and metadata managementHow to query and manipulate Iceberg tables using Python (PyIceberg), SQL, and SparkReal-world integration with Trino, Flink, DuckDB, and PolarsUsing S3 object storage for cloud-native Iceberg tablesPerforming time travel, incremental reads, and snapshot-based rollbacksOptimizing performance with file compaction, statistics, and clusteringBuilding reproducible, scalable, and maintainable data pipelinesWho Is This Course For?Data Engineers and Architects building modern lakehouse systemsPython Developers working with large-scale datasets and analyticsCloud Professionals using AWS S3 for data lakesAnalysts or Engineers moving from Hive, Delta Lake, or traditional warehousesAnyone passionate about data engineering, analytics, and open-source innovationTools & Technologies You'll Use:Apache Iceberg, PyIceberg, Spark, Flink, TrinoDuckDB, Polars, Pandas, SQL, AWS S3, ParquetIntegration with Metastore/Catalogs (REST, Glue)Hands-on with Jupyter Notebooks, CLI, and script-based workflowsBy the end of this course, you'll be able to design, deploy, and scale data lakehouse solutions using Apache Iceberg and a rich ecosystem of open-source tools - confidently and efficiently.
Who this course is for
This course is for data professionals and beginners who want to build scalable, modern data lakehouse solutions using Apache Iceberg and open-source tools.

Code:

Bitte Anmelden oder Registrieren um Code Inhalt zu sehen!

Suche

[MULTI] Data Lakehouse Engineering With Apache Iceberg

jinkping5

Ähnliche Themen

Data-Load.me | Data-Load.ing | Data-Load.to | Data-Load.in

Nützliche Links

Partner

Ist Data-Load legal?