Spark Compatible

Your Spark code. On a blazing-fast Rust engine.

LakeSail speaks Spark Connect. Your existing Spark code runs as-is, no rewrite required.

pyspark: connect to Sail
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("etl-pipeline") \
    .remote("sc://spark-cluster:15002sail-server:50051") \
    .getOrCreate()

Up to 8x
Faster on average than Spark
16x
More Data Processed
94%
Lower Cost
0
Code changes required when switching from Spark

Why Sail is faster

Three architectural advantages that compound across every workload.

Native Columnar Execution

Sail uses the Apache Arrow in-memory format and DataFusion query engine. The columnar layout enables SIMD processing of multiple records per CPU cycle, yielding higher throughput per core than JVM-based, row-oriented solutions.

Predictable Runtime

Built in Rust, Sail uses deterministic memory management instead of garbage collection. Compute is never interleaved with GC pauses, resulting in consistent task completion times with far fewer tail latency spikes.

Zero-Copy Python UDFs

Sail embeds a Python interpreter directly in the engine process. Your UDFs share Apache Arrow memory buffers with the engine via array pointers. No Py4J bridge, no serialization between Python and the execution layer.

Spark Connect compatibility.

Your PySpark session acts as a gRPC client that communicates with the Sail server via the Spark Connect protocol. Same SQL, same DataFrame APIs, same UDFs. No rewrite needed.

No rewrite required

Your existing PySpark scripts, SQL queries, and UDFs work as-is. Deploy Sail in shadow mode for production pipelines, or migrate workloads incrementally.

Lightweight workers

The Sail process starts within seconds and consumes only a few dozen megabytes when idle. No heavyweight JVM executors, no capacity planning.

Run side by side

Run LakeSail alongside Spark. Compare outputs, validate results, and cut over at your own pace. No big-bang migration required.

Lower cloud costs

Sail finishes the same workloads on smaller instances. Lightweight containers and instant scaling reduce the need for manual tuning and over-provisioning.

Getting Started

Simple to get started

LakeSail runs in your AWS account, so there are a few setup steps. Here’s what to expect.

1

Create Account

Sign up with email, verify via code, and set up mandatory 2FA.

2

Connect AWS

Launch a CloudFormation template in your account. Requires admin access.

3

Create Cluster

Set up a VPC with your CIDR block, then create a cluster.

Run Your Spark Code

Point your existing PySpark or SQL workloads at LakeSail and go.

Ready to optimize your Spark workloads?