Sail 0.4: Native Apache Iceberg Support

The latest Sail 0.4 release advances our lakehouse architecture with native support for Apache Iceberg and major improvements to our Delta Lake integration. Iceberg’s open and flexible table format has become a cornerstone of modern data architecture, and Sail now brings it directly into its distributed query engine. With both Iceberg and Delta Lake supported natively, Sail takes another step toward a unified, high-performance compute layer that can operate seamlessly across open formats.

Enhancing Delta Lake Integration

Since our initial release of Delta Lake support, we have made major changes to the internals of our integration. Delta Lake operations have now been broken down into separate physical execution nodes, each with a single, well-defined responsibility, such as scanning files, writing files, or committing a new table version. Together with standard nodes such as filter, these components form complete physical execution plans that support advanced DML statements, including conditional deletes.

This design makes the system more modular and transparent, while at the same time laying the groundwork for distributed execution of lakehouse table operations. We’re working on making that capability a reality. Stay tuned!

Native Iceberg Integration

After refining the Delta Lake internals, we turned our attention to Iceberg, applying the same modular approach to its integration. By adopting the specification and utilities from the iceberg-rust repository, we connected Iceberg seamlessly with Sail’s internal systems for SQL parsing, query planning, catalog management, storage handling, and physical execution.

In Sail 0.4, this integration extends beyond basic table format support to include native compatibility with the Iceberg REST Catalog, supported by catalog services such as Apache Polaris and Cloudflare R2 Data Catalog. This addition allows Sail to resolve table metadata through standard APIs.

Together, these layers form a native Rust integration that operates directly within Sail’s query engine. Iceberg tables participate in the same planning, optimization, and execution processes as other data sources, while catalog operations integrate seamlessly with Sail’s distributed query execution model. The result is a unified execution path across open table formats with consistent semantics and predictable performance.

Building Toward a Unified Lakehouse

A key theme in Sail’s mission is unification. We believe the best way to achieve this is by developing a coherent foundation for integrating data formats, catalogs, and storage systems with our distributed multimodal query engine. This will allow us to simplify the architecture, improve maintainability, and accelerate support for new lakehouse standards.

For instance, as we worked on the Delta Lake and Iceberg integrations, we noticed recurring structures and design patterns across both implementations. Each format relies on similar abstractions for file scanning, transactional updates, and version management. Recognizing this, we decided to define a shared foundation that can serve as common ground for all future lakehouse integrations.

Version 0.4 and Beyond

To reflect the significance of these developments and the features planned ahead, we are updating Sail’s version to 0.4. In the past, we bumped minor versions only when there was a breaking change in the Sail CLI or Python library. Starting with 0.4, minor versions will increase more regularly to reflect meaningful architectural and feature milestones. We believe this approach offers the community important visibility into progress updates and better aligns our versioning with the pace of engineering improvements.

Community Contributors

As always, Sail wouldn’t have its current shape without support from our growing community. Huge thanks to @SparkApplicationMaster, @davidlghellin, and @zemin-piao (first-time contributor) for their contributions to this release!

As an open-source, community-driven project, Sail welcomes contributions of all kinds. Whether you want to report a bug, suggest a feature, or contribute code, you’re welcome to join the community on Slack or submit issues and pull requests on GitHub.

Getting Started

Sail was specifically designed to be simple to adopt. Many teams have already moved their Spark workloads to Sail with minimal changes and have seen dramatic reductions in compute costs.

You can get started in just a few minutes. Follow the Getting Started guide to install Sail with pip, start the Sail server, and/or connect using your existing PySpark client with no rewrites required. If you’ve already been using Sail, please let us know and we’ll send you some free merch!

Delta Lake™ is a trademark of LF Projects, LLC. Apache® and Apache Iceberg™ are trademarks (or registered trademarks) of The Apache Software Foundation. LakeSail, Inc. is an independent company and is not affiliated with, endorsed by, or sponsored by the Delta Lake project, LF Projects, LLC, or The Apache Software Foundation.

Get in Touch to Learn More