Sail 0.5: Resilient and Observable Distributed Execution

Modern query engines have become very good at speed. But at larger scale, speed alone isn’t what blocks teams—it’s failure recovery, operational predictability, and the ability to understand what the system is doing while a workload is running.

Sail 0.5 focuses on those constraints directly. This release introduces a redesigned control plane, a new execution model for scheduling and recovery, improved resource management, and SQL-native observability. Together, these changes make distributed execution more resilient, easier to operate, and easier to reason about in production.

Evolving the Control Plane

In cluster mode, each Sail session has a driver that coordinates execution within the cluster. The driver manages workers, which perform computation and exchange data using the Arrow Flight protocol. The distributed execution plan of a query is represented by a job graph, a directed acyclic graph (DAG) of stages where each stage is partitioned for parallel execution of tasks that each process a subset of data.

Sail has always had a clean split between the control plane and the data plane, but as workloads grew in scale and complexity, the old driver architecture became limiting for the next set of features, especially around recovery behavior, richer scheduling policies, and shuffle strategy evolution. These constraints motivated a redesign aimed at separating concerns inside the control plane itself.

To address these limitations, Sail 0.5 introduces a restructured driver architecture that decomposes the control plane into distinct components: a job scheduler that owns the job graph and job state, a task assigner that maintains a task queue and tracks slot state, and a worker pool that manages worker lifecycle and worker state. The following diagram highlights these components.

At runtime, these components are coordinated by the driver actor, which continues to run the same single-threaded event loop as before. On the worker side, each worker actor now runs a task runner, a peer tracker, and a stream manager. The driver itself can now also run tasks and manage task streams, a capability not yet in use but necessary for moving catalog operations—which require the driver session—into physical execution in the future.

Further, task definition semantics are now decoupled from physical plan rewriting. The driver no longer rewrites the physical plan to inject shuffle read and write operators before running tasks. Instead, a TaskDefinition structure represents logical input and output information, and the TaskRunner is responsible for rewriting the plan to include the appropriate shuffle readers and writers. This keeps the driver focused on scheduling while leaving execution details to the workers.

Task Regions, Shuffle Semantics, and Failure Recovery

Sail 0.5 introduces the concept of task regions, which are sets of tasks that can exchange data using pipelined shuffles. Boundaries between task regions correspond to blocking shuffles, where shuffle outputs must be written to remote storage. A single task region may span multiple stages, allowing pipelined execution to extend across stage boundaries.

Within a task region, tasks exchange data directly without intermediate materialization and are all scheduled together and retried atomically. Across task regions, previously materialized data is reused even if execution of a subsequent task region fails. This hybrid model allows Sail to combine low-latency pipelined execution with robust recovery semantics where needed.

The same model also unifies how data moves between stages. Partitioned data can be shuffled (as previously supported), broadcast, merged, or forwarded, and the job scheduler derives task regions from these exchange modes accordingly. This provides a foundation for adding new shuffle algorithms without heavy modifications to the control plane.

Failure recovery builds on these task region semantics. Each task may have multiple attempts, and when a single task fails within a region, all tasks in that region are canceled in a cascading fashion and then rescheduled together. Because tasks within a region are scheduled together in nondeterministic order, the stream management logic was updated to support fetching streams before they are created. The worker’s stream manager incorporates the task attempt number into the stream identifier to isolate streams across retries. The job output was redesigned to detect and discard output streams from failed attempts, ensuring that only current results are used.

Each Sail worker exposes a fixed number of task slots, which are logical representations of how tasks occupy the worker’s resources. Previously, each task consumed an entire slot. In Sail 0.5, tasks from different stages within the same task region may share a slot if those stages belong to the same slot-sharing group. This improves resource utilization when CPU and memory usage patterns vary across stages and significantly reduces the number of workers required for jobs with many tasks.

Once all dependency regions are complete, the job scheduler enqueues the next task region. The task assigner derives the required number of task slots from this queue, enabling the worker pool to provision workers on demand.

System Catalog: Observability Through SQL

Sail 0.5 introduces a built-in system catalog containing in-memory tables that expose internal execution state, including sessions, jobs, stages, tasks, and workers. This information is directly queryable using SQL, reflecting a deliberate design choice: operational state should be inspectable through the same declarative language used for OLAP workloads.

Exposing internal state through system catalogs has long been standard practice in OLTP databases. Computation engines, in contrast, typically rely on bespoke UIs and REST APIs for observability. While functional, that model fragments operational insight across multiple surfaces and limits composability. Sail instead adopts ideas from the OLTP world and treats SQL as a first-class interface—not only for data processing but also for system observability.

The system catalog also complements Sail’s OpenTelemetry integration. Metrics, traces, and logs emitted using OpenTelemetry standards can be correlated with catalog queries to provide a unified view of execution state and runtime performance.

Looking Ahead

We’re planning additional distributed execution capabilities, including data plane changes for blocking shuffles and support for configurable slot-sharing groups. These enhancements will further improve performance, flexibility, and resource efficiency while preserving the architectural principles introduced in Sail 0.5.

Join the Community

Sail 0.5.0 consists of around 70 commits over the past three weeks and reflects our design thinking from the past few months. Alongside this work, we are also seeing steady momentum from the community. Since the start of 2026, we have welcomed four new contributors, each of whom has delivered nontrivial code changes for the releases.

Stay tuned for more feature announcements and benchmark results! You can also follow along on GitHub or by joining our Slack Community.

LakeSail Platform: Managed Sail in Your Cloud

Want to run Sail with managed infrastructure? LakeSail is launching a fully managed platform with built-in governance, observability, BYOC deployment, and enterprise controls. Request early access to see how Sail can improve performance and reduce costs for your team.