Plug in. Nothing else changes.
LakeSail implements the Spark Connect protocol. Point your existing Spark code at a new endpoint and that's it. Same DataFrame API, same libraries, same pipelines. The engine upgrades; your code doesn't.
Update one config line
Swap the .remote() endpoint string. That's the migration. No code changes, no schema conversions,
no library updates.
Run your existing jobs
Every PySpark DataFrame operation, Spark SQL query, and Python UDF you already have continues to work modification. LakeSail's Sail engine executes them natively, no JVM translation layer.
Watch the cost drop
Rust-native execution with no GC pauses, no JVM startup overhead, and scale-to-zero workers. On the derived TPC-H benchmark, that translates to roughly 94% lower compute cost vs JVM-based Spark. Your workload will vary.
Add new capabilities at your pace
Once you're running, unlock the agent layer, lakehouse branching, and native Python workloads, none of which require any changes to your existing pipelines.
No lock-in. Your formats, your cloud.
LakeSail is built on open standards end to end. Native Apache Iceberg and Delta Lake support means your tables stay exactly where they are, no conversion, no copying. Your data stays in open formats in your own AWS account. Switching engines should never require migrating terabytes of data.
Apache Iceberg
Native read and write support. Time travel, schema evolution, partition evolution, all supported. LakeSail does not require you to convert or copy Iceberg tables.
Delta Lake
Native read and write support for Delta tables. Keep Delta indefinitely or migrate to another format at your own pace, your choice, not ours.
Spark Connect protocol
Full Apache Spark Connect compatibility. Any code that runs on Spark 3.5 or Spark 4.x against the Connect protocol runs unchanged on LakeSail. This is not a partial implementation.
Apache Arrow & DataFusion
The Sail engine is built on Apache Arrow and DataFusion, both Apache Software Foundation projects with large, independent ecosystems.
What changes when the runtime is Rust
Sail is the open-source Rust engine at LakeSail's core. Built on Apache Arrow and DataFusion. No JVM, no GC, no serialization overhead. One engine handles every workload type.
Rust-native runtime
No JVM startup. No garbage collection pauses. No JVM memory tuning. Sail boots instantly and scales to zero between jobs, you only pay for compute you use.
Vectorized query execution
Built on Apache Arrow's columnar format and DataFusion's vectorized query engine. Processes data with SIMD acceleration where available.
Unified batch + stream + AI
One engine, one API, one cost model. Run batch ETL, Python workloads, and interactive SQL queries without switching tools, re-learning APIs, or managing separate clusters.
Native Python at engine speed
Python UDFs and workloads execute natively in-process, no inter-process serialization, no JVM-to-Python IPC overhead. AI/ML pipelines that previously paid a heavy tax run at native speed.
Stateless, secure workers
Workers are fully stateless, no shuffle data on disk between runs, no leftover JVM processes consuming memory. Each job gets a clean, isolated execution environment. Easier security audits, simpler ops.
Transparent cost model
Charged for actual compute hours, fully transparent, predictable, with no opaque credits. Autoscales to zero between jobs. No minimum spend. No contract lock-in. You see exactly what you're paying for.
Native Python. No JVM tax.
Other engines run Python work in separate worker processes and move data across the JVM boundary. LakeSail runs Python natively at engine speed, no inter-process overhead, no tuning required.
Runtime UDFs without serialization
Define Python UDFs inline in your PySpark code. Sail executes them in the Rust engine via PyO3, no pickling, no JVM bridge, no data copying between processes.
AI/ML pipelines as first-class workloads
LLM inference, embedding generation, model scoring, these are native workload types, not workarounds. Feed your lakehouse data directly into ML pipelines without building data bridges.
Multimodal lakehouse
Process PDFs, images, and video as first-class lakehouse data types. Structured and unstructured data in one query, no ETL step to a separate vector store or object store pipeline.
Scale from laptop to cloud
Develop locally on the same Sail runtime, then point the same workload at production.
Built for AI agents from day one.
Competitors retrofit agent support onto a JVM platform that was never designed for it. LakeSail ships an MCP server, lakehouse branching, and full audit trails as core engine features, not add-ons.
Native MCP server
LakeSail exposes a Model Context Protocol server out of the box. Connect any MCP-compatible AI agent, Claude, GPT, custom agents, to your lakehouse directly. Query, transform, and write data without building a custom tool layer.
Lakehouse branching
Agents can branch your lakehouse like a git repo. Create an isolated sandbox for a transformation or analysis, run it, review the diff, and commit or discard, all without touching production data.
Elastic agent compute
Compute provisions per agent workload, scales with execution, and releases when work is done. Sub-second cold starts on the Rust-native engine mean short-lived agent loops never pay JVM warm-up, and there are no idle clusters between calls.
Dynamic Python tooling
Agents can define Python tools and data sources at runtime, custom logic that runs at engine speed against lakehouse data. No pre-registration, no redeployment. The agent writes it, Sail executes it.
Agent receives task
LLM agent gets context via MCP server tools
Branch created
Isolated sandbox branched from production lakehouse
Actions executed
Queries, transforms, writes, all audited in real time
Human reviews diff
Exact changes surfaced for approval before any commit
Commit or discard
One-click merge to production, or clean rollback
From open source to fully managed.
Three ways to run Sail. OSS if you want to self-host, LakeSail Platform if you want managed, and Enterprise for organizations with procurement requirements. All three use the same engine.
Sail OSS
The Rust engine, open source on GitHub under Apache 2.0. Self-host on your own infrastructure. No managed services, no support SLA, but full access to the engine.
- Apache 2.0 license
- Full Spark Connect compatibility
- Community support via GitHub & Slack
- No contracts, no commitments
LakeSail Platform
Fully managed Sail, deployed into your AWS account (BYOC). We handle the infrastructure; you keep data sovereignty and security. No cluster management, no ops overhead.
- Deploys into your AWS account (BYOC)
- Managed upgrades, patches, monitoring
- Autoscale to zero between jobs
- Per-second billing, no minimums
Enterprise
For organizations with procurement, security review, and custom SLA requirements. Includes dedicated support, private deployment options, and enterprise SSO.
- Everything in Platform, plus:
- Dedicated support with SLA
- Enterprise SSO (SAML, OIDC)
- Custom licensing models
Your Spark workloads.
A better engine.
Get a 30-minute demo and a benchmark of LakeSail against your existing Spark workloads.