Blog
Engineering deep-dives and updates from the LakeSail team.
Agent Skills for Spark Workloads
Meet the new Sail CLI feature for one-shot execution of any PySpark script. Access instant Spark-compatible compute in a single command, and give your agents data and AI engineering capabilities on demand.
Sail 0.5: Resilient and Observable Distributed Execution
Sail 0.5 introduces a redesigned control plane for distributed execution with task region scheduling, unified shuffle, failure recovery, and a system catalog queryable via SQL.
Spark's Python Problem and How Sail Solves It
AI workflows rely on Python, but Spark isolates Python behind an inter-process boundary. Sail executes Python UDFs natively in-process for true high-performance distributed compute.
How Sail Utilizes and Extends Apache DataFusion
Sail adopts Apache DataFusion's trusted query engine and extends it as part of a larger, distributed compute framework with Spark-compatible semantics.
Sail, the Last Piece of the Composable Data Stack
The future of data is composable. Sail brings distributed computation into that vision, modular, Arrow-native, and Spark-compatible.
Sail 0.4: Native Apache Iceberg Support
Sail 0.4 introduces native Apache Iceberg support and major improvements to Delta Lake integration.
Sail Turns One
Sail turns one! Celebrate with us as we reflect on our journey, and look ahead to the future of unified data and AI workloads.
Sail 0.3: Long Live Spark
Sail 0.3 adds support for Spark 4.0 while maintaining compatibility with Spark 3.5, along with faster object store performance and revamped documentation.
Sail 0.3.2: Start the Journey from Your Lakehouse
Sail 0.3.2 brings native Delta Lake read/write support and expanded object storage integration with Azure, GCS, Cloudflare R2, and AWS S3.
Announcing Sail 0.2.6
Sail 0.2.6 delivers enhancements across temporal data handling, SQL compatibility, Parquet integration, and the MCP server.
Sail MCP Server: Spark Analytics for LLM Agents
With the Sail MCP server, data analytics in Spark is possible for both LLM agents and humans.
Writing a Rust SQL Parser in One Week
A close look at Sail's new in-house SQL parser built using parser combinators and Rust procedural macros.
Beyond the JVM: How Rust is Redefining Big Data for the AI Era
Rust is redefining big data infrastructure by offering superior performance, memory safety, and scalability over traditional JVM-based systems.
Sail 0.2.1: Enhanced UDF Support
How enhanced UDF support in Sail opens up possibilities to bridge the gap between traditional ETL workloads and AI.
Why It's Possible Now: Sail 0.2 and the Evolution of Distributed Compute Frameworks
Announcing Sail 0.2, the latest milestone in the evolution of distributed compute frameworks. Explore how advancements in programming languages and data infrastructure make it possible to unify batch, stream, and AI workloads into a high-performance framework.
Sail 0.2 and the Future of Distributed Processing
We are thrilled to unveil the preview release of Sail 0.2, which introduces support for distributed processing on Kubernetes. A detailed architectural deep dive and an overview of our increased support for Spark.
Introducing Sail Enterprise Support
Discover how Sail Enterprise Support empowers your team with dedicated, flexible, and customizable solutions to meet the needs of your organization.
A Sail Recipe: Tackling an Out-of-Control Redshift Bill
Deriving insights from your data shouldn't cost you an arm, a leg, and a kidney. Learn how you can work directly with your data in Amazon S3 using Sail, saving both time and money.
The First PySail Release
We are thrilled to announce the 0.1 release of Sail. Get started with the PySail package today, and check out the documentation site.
Supercharge Spark: Quadruple Speed, Cut Costs by 94%
The preview of Sail is here. In the derived TPC-H benchmark, Sail achieves nearly 4x speed-up and 94% hardware cost reduction, with the same PySpark code.
Ready to get started?
Get early access to LakeSail.