Sail 0.3.2: Start the Journey from Your Lakehouse

With 0.3.2, Sail now supports reading and writing Delta Lake tables—one of the most requested features from our users. That means you can point Sail at an existing Delta dataset (on S3, Cloudflare R2, Azure, GCS, or file systems) and start querying right away using familiar Spark-compatible syntax.

Additionally, to celebrate our community and progress, we’re sending out free LakeSail merch to anyone currently using Sail! Whether you’re using it in production, on a side project, or exploring it internally—fill out this short form, and we’ll send something your way. That includes all our contributors!

Delta Lake Integration

Delta Lake is a leading open-source storage format for lakehouse systems. It provides ACID transactions, schema evolution, and versioning on top of cloud object stores. Its adoption has grown rapidly across data teams who want warehouse guarantees without warehouse lock-in.

In this release, we added support for reading and writing Delta Lake tables natively in Sail. This integration enables direct interoperability with existing Delta datasets and aligns with our vision for supporting structured data formats in a distributed, cloud-native environment.

We didn’t take any shortcuts. Instead of integrating with only the high-level APIs of delta-rs or delta-kernel-rs, we dug into the internals and built against lower-level APIs. This approach took considerably more work, but it allowed us to have long-lasting confidence in high read/write performance and opened up possibilities to run table operations in Sail’s distributed setup.

All of this groundwork forms a solid foundation toward full Delta feature coverage in Sail in the future, including MERGE, DELETE, table maintenance, and streaming. Support for additional lakehouse formats (such as Apache Iceberg) is also on our roadmap.

Expanded Object Storage Support Across Azure, Google, and AWS

In addition to Delta Lake integration, Sail now supports a broader range of cloud object storage backends—specifically native read and write support for Cloudflare R2, Azure, and Google Cloud Storage (GCS). Beyond R2, Azure, and GCS, we’ve also expanded our Amazon S3 support to recognize HTTPS-based endpoints for S3 Express One Zone and S3 Transfer Acceleration, enabling higher throughput and lower latency for demanding workloads.

These additions bring Sail closer to true cloud-agnostic processing, allowing you to query and transform data wherever it lives, using a unified, high-performance engine.

Check out our documentation for more details.

Catalogs and Write Operations

As you may know, Sail already supports temporary views and partially supports CREATE TABLE via its in-memory catalog. In Sail 0.3.2, however, we reworked of our catalog management logic. While these changes are largely under the hood, the new abstractions we’ve defined lay the groundwork for integrating with remote catalog services more easily—making it possible in the near future to persist table and view definitions across sessions.

We also consolidated the write operation implementations across:

DataFrame.write (Spark DataFrame writer v1 API)
DataFrame.writeTo() (Spark DataFrame writer v2 API)
INSERT INTO and INSERT OVERWRITE DIRECTORY SQL statements

This consolidation fixes several known bugs in the write behavior and makes it easier to support more lakehouse formats and operations in the future (e.g. conditional delete and update).

Thanks to Our Contributors

We want to give a special thanks to everyone who contributed to this release.

Shoutout to @SparkApplicationMaster for contributions across bug fixes, features, and enhancements! Huge thanks to @rafafrdz, @davidlghellin, @anhvdq (first-time contributor), and @jamesfricker (first-time contributor), for helping to further extend our parity with Spark SQL functions!

We’re always open to new contributors—if you’re interested, check out the open issues, join our Slack Community, and consider giving Sail a star on GitHub if you enjoy what we’re building.

LakeSail’s Mission

If you’re new to LakeSail, our mission is to unify batch, streaming, and AI workloads within a single, high-performance framework. Sail, written entirely in Rust, eliminates the overhead of the JVM and enables seamless migration of existing Spark code without complex rewrites. We believe this evolution is both natural and necessary. Just as recent hardware advances have driven exponential gains in speed and efficiency, Sail pushes the boundaries in software, striving to match these innovations with a streamlined, high-performance solution. In benchmark testing, Sail runs ~4x faster than Spark at just 6% the cost on average (derived from the industry-standard TPC-H benchmark), setting a new standard for the next generation of data processing infrastructure.

Getting Started

Getting started with Sail is simple. Head over to our developer guide for a step-by-step guide on installing, configuring, and running Sail with PySpark.

If you’re already using Sail or exploring adoption at scale, our enterprise support offering provides flexible support, custom integrations, and enables you to optimize workloads and scale with confidence.

Delta Lake™ is a trademark of LF Projects, LLC. LakeSail, Inc. is an independent company and is not affiliated with, endorsed by, or sponsored by the Delta Lake project or LF Projects.

Get in Touch to Learn More