Your Spark code. On a blazing-fast Rust engine.
LakeSail speaks Spark Connect. Your existing Spark code runs as-is, no rewrite required.
from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("etl-pipeline") \ .remote("sc://spark-cluster:15002sail-server:50051") \ .getOrCreate()
Why Sail is faster
Three architectural advantages that compound across every workload.
Native Columnar Execution
Sail uses the Apache Arrow in-memory format and DataFusion query engine. The columnar layout enables SIMD processing of multiple records per CPU cycle, yielding higher throughput per core than JVM-based, row-oriented solutions.
Predictable Runtime
Built in Rust, Sail uses deterministic memory management instead of garbage collection. Compute is never interleaved with GC pauses, resulting in consistent task completion times with far fewer tail latency spikes.
Zero-Copy Python UDFs
Sail embeds a Python interpreter directly in the engine process. Your UDFs share Apache Arrow memory buffers with the engine via array pointers. No Py4J bridge, no serialization between Python and the execution layer.
Spark Connect compatibility.
Your PySpark session acts as a gRPC client that communicates with the Sail server via the Spark Connect protocol. Same SQL, same DataFrame APIs, same UDFs. No rewrite needed.
No rewrite required
Your existing PySpark scripts, SQL queries, and UDFs work as-is. Deploy Sail in shadow mode for production pipelines, or migrate workloads incrementally.
Lightweight workers
The Sail process starts within seconds and consumes only a few dozen megabytes when idle. No heavyweight JVM executors, no capacity planning.
Run side by side
Run LakeSail alongside Spark. Compare outputs, validate results, and cut over at your own pace. No big-bang migration required.
Lower cloud costs
Sail finishes the same workloads on smaller instances. Lightweight containers and instant scaling reduce the need for manual tuning and over-provisioning.
Simple to get started
LakeSail runs in your AWS account, so there are a few setup steps. Here’s what to expect.
Create Account
Sign up with email, verify via code, and set up mandatory 2FA.
Connect AWS
Launch a CloudFormation template in your account. Requires admin access.
Create Cluster
Set up a VPC with your CIDR block, then create a cluster.
Run Your Spark Code
Point your existing PySpark or SQL workloads at LakeSail and go.