Sail 0.3: Long Live Spark
The LakeSail Team
July 8, 2025
5 min read
LakeSail is proud to announce the release of Sail 0.3, bringing modern performance and long-term Spark compatibility through a Rust-native execution engine. This release pushes forward our vision for a faster, unified, and composable data layer—one that aligns with Spark’s pre-existing interface but evolves to meet the increasing demands of batch, streaming, and AI workloads.
Sail 0.3 adds support for Spark 4.0 while maintaining compatibility with Spark 3.5, and improves how Sail adapts to changes in Spark’s behavior across versions. It also brings faster object store performance—reducing latency and improving throughput across cloud-native storage—and a revamped documentation site, making it easier than ever to get up and running with Sail and understand how it can fit into your stack.
With these improvements, you can run Sail with the latest Spark version or keep your current production environment with confidence, knowing that Sail is built for long-term commitment.
Spark 4.0 and the Lightweight PySpark Client
Sail is a server implementation of the Spark Connect protocol in Rust. Every DataFrame or SQL operation from the PySpark client is translated into a DataFusion logical plan, then into an optimized physical plan that runs on either a multi-threaded in-process runtime, or a cluster of fast, memory‑safe native Rust workers. Because the contract is defined at the protocol layer, existing PySpark code connects to Sail with no edits, while engineers gain Rust‑level performance and predictable memory behavior.
With Spark 4.0, there’s now a pyspark-client
package that includes only the lightweight PySpark client with no JARs. Sail is compatible with both the pyspark-client
and the full pyspark
package, allowing teams to integrate quickly and take advantage of Sail’s performance and cost efficiency. In recognition of this expanded compatibility, we’ve bumped the minor version in this release.
Changes in the Installation Command
In previous versions (e.g., Sail 0.2.x), installing PySpark 3.5 alongside Sail was done in the following way:
pip install "pysail[spark]"
Starting in Sail 0.3, we no longer support installing extra dependencies via [spark]
. You must explicitly install the PySpark library of your choice in the Python environment.
For example, you can choose either the full PySpark 4.0 library (along with Spark Connect support), or the thin PySpark 4.0 client, using one of the following commands:
pip install "pysail==0.3.1" "pyspark[connect]==4.0.0"
pip install "pysail==0.3.1" "pyspark-client==4.0.0"
This approach offers more control and flexibility, especially as Spark Connect adoption grows and variants of the client emerge.
Runtime Behavior with Spark Version Awareness
As you might expect, a major version upgrade of Spark tends to come with breaking changes to its internals. For example, we have found that the serialization protocol for PySpark UDFs and UDTFs differs between versions. Sail automatically detects the installed PySpark version in the Python environment and adjusts its runtime behavior accordingly, ensuring that a single Sail library remains compatible across both versions.
To track feature parity and avoid regressions, Python unit tests for both Spark 3.5 and Spark 4.0 now run automatically on every pull request.
New and Improved Documentation
Alongside the release, we’ve rolled out a new documentation site with updated getting-started guides, architecture diagrams, and compatibility notes to help you get up and running with Sail and understand its compatibility with Spark. More documentation is on the way, including advanced topics and usage patterns.
If you have questions concerning the docs, feel free to reach out at support@lakesail.com or join our Slack community.
Thanks to Our Community Contributors!
Sail is built for developers, and we’re always excited when the open source community jumps in to help push it forward. This release features contributions from several first-time contributors who we would love to thank and highlight:
We welcome contributions! If you’re curious about getting involved, check out the open issues and drop into the discussions. And if Sail has been useful in your stack, we’d love your support with a GitHub star!
LakeSail is also a growing team. If you’re excited about our mission to modernize data processing and want to help shape the future of compute, we’d love to hear from you!
What’s on the Horizon
Sail is evolving quickly, not just to match Spark’s latest features but to push the boundaries of what a modern data engine can do. Support is planned for Spark 4.0 capabilities such as the Python data source and the VARIANT
data type, with broader efforts progressing well beyond Spark parity. Main themes in our roadmap include stream processing, lakehouse format support, observability improvements and web interface, distributed execution improvements, and contributing Spark functions upstream to DataFusion.
LakeSail’s Mission
At LakeSail, our mission is to unify batch, streaming, and AI workloads within a single, high-performance framework. Sail, written entirely in Rust, eliminates the overhead of the JVM and enables seamless migration of existing Spark code without complex rewrites. We believe this evolution is both natural and necessary. Just as hardware advances have driven exponential gains in speed and efficiency, Sail pushes the boundaries in software, striving to match these innovations with a streamlined, high-performance solution. In benchmark testing, Sail runs ~4x faster than Spark at just 6% the cost on average (derived from the industry-standard TPC-H benchmark)—setting a new standard for the next generation of data processing infrastructure.
Getting Started
Getting started with Sail is simple. Head over to our deployment guide for a step-by-step guide on installing, configuring, and running Sail with PySpark.
If you’re already using Sail or exploring adoption at scale, our enterprise support offering provides flexible support, custom integrations, and enables you to optimize workloads and scale with confidence.
Get in Touch to Learn More