Your Python code. No JVM required.
Sail executes Python UDFs natively in-process inside the Rust engine via PyO3. Your UDFs run on shared Arrow memory with zero serialization or inter-process communication overhead.
In Spark, Python crosses a process boundary. In Sail, it doesn't.
Spark's JVM-Python boundary
The JVM and Python cannot share the same process space. Spark runs Python UDFs in separate worker processes, serializing data out of the JVM, transferring it via inter-process communication, and deserializing into Python objects on every batch. Even with Arrow-based pandas UDFs that batch data column-wise, the underlying process boundary remains.
Native in-process execution via PyO3
Sail integrates PyO3 to let Rust and Python share the same process space and memory directly. When the physical plan references a Python UDF, Sail binds the function through PyO3 and invokes it inline, applying the same vectorized execution and operator-fusion techniques as native SQL and Rust-based operations. Python executes at its native speed with no inter-process overhead.
Built for AI-heavy workloads
AI pipelines combine SQL transformations with embedding generation, model inference, feature engineering, and deep integration with Python libraries like NumPy, PyArrow, and model runtimes. Sail embeds Python within the same runtime as native operators on shared Arrow buffers. No external Python process, no Py4J coordination, no repeated marshaling across runtimes.
Spark’s Python UDF problem, solved
Every Python UDF call in Spark pays the full round-trip across a process boundary. Sail eliminates that boundary entirely.
Every UDF call crosses a process boundary
Data must leave the JVM, cross to a separate Python worker process, and return the same way. CPU cycles are consumed by serialization and inter-process communication, not computation. Accelerators like Photon can speed up physical plan execution, but because the JVM still owns the control plane, Python UDFs still cross the same boundary.
Python runs in-process on shared memory
Sail embeds a Python interpreter directly inside the Rust engine. UDFs share Apache Arrow memory buffers with the engine via array pointers. No serialization or copying between built-in operations and your Python code.