Performance

Beats Spark
on the public benchmarks.

Every query. Open methodology. Run it yourself.

ClickBench · c6a.4xlarge · lower is faster
Public
Sail
x1.00
Spark + Gluten/Velox
x4.03
Spark + Auron
x4.71
Spark + Comet
x5.49
Apache Spark
x5.90
Total best-of-3 runtime across all 43 queries, normalized to Sail = x1.00. 2026-05-11 results in ClickHouse/ClickBench.
8x
Faster than Spark
16x
More data, same cost
94%
Lower infra cost
0
JVM.
PERIOD.
Query-by-Query Breakdown

All 43 queries. Nothing hidden.

ClickBench is a public third-party benchmark for analytical query engines, maintained by the ClickHouse team and open to entries from any system. Forty-three queries against a real-world web analytics dataset. Lower time = faster. Hardware: AWS c6a.4xlarge, single node, Parquet.

8.4x
Median per-query speedup vs Spark
216x
Best single-query speedup (Q7)
2.6x
Worst single-query speedup (Q35)
43 / 43
Queries where Sail leads Spark
Q# Query Sail Spark Speedup
Q1 SELECT COUNT(*) FROM hits 0.014s 2.95s 210.7x
Q2 SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0 0.05s 3.22s 64.4x
Q3 SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits 0.072s 3.37s 46.8x
Q4 SELECT AVG(UserID) FROM hits 0.075s 3.32s 44.3x
Q5 SELECT COUNT(DISTINCT UserID) FROM hits 0.77s 6.64s 8.6x
Q6 SELECT COUNT(DISTINCT SearchPhrase) FROM hits 0.88s 6.61s 7.5x
Q7 SELECT MIN(EventDate), MAX(EventDate) FROM hits 0.015s 3.25s 216.7x
Q8 SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC 0.053s 3.72s 70.2x
Q9 SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10 0.94s 7.43s 7.9x
Q10 SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10 0.99s 8.34s 8.4x
Q11 SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10 0.25s 5.57s 22.3x
Q12 SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10 0.27s 5.75s 21.3x
Q13 SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10 0.92s 6.54s 7.1x
Q14 SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10 1.3s 9.63s 7.4x
Q15 SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10 0.91s 6.89s 7.6x
Q16 SELECT UserID, COUNT(*) FROM hits GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10 0.88s 6.85s 7.8x
Q17 SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10 1.85s 9.37s 5.1x
Q18 SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase LIMIT 10 1.84s 8.02s 4.4x
Q19 SELECT UserID, extract(minute FROM CAST(EventTime AS TIMESTAMP)) AS m, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, m, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10 3.39s 14.3s 4.2x
Q20 SELECT UserID FROM hits WHERE UserID = 435090932899640449 0.084s 3.13s 37.3x
Q21 SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%' 1.34s 5.95s 4.4x
Q22 SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10 1.63s 7.26s 4.5x
Q23 SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10 3.59s 9.86s 2.7x
Q24 SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10 10.2s 41.41s 4.1x
Q25 SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime LIMIT 10 0.47s 4.39s 9.3x
Q26 SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10 0.4s 4.11s 10.3x
Q27 SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10 0.47s 4.48s 9.5x
Q28 SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25 1.64s 7.84s 4.8x
Q29 SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25 3.29s 23.59s 7.2x
Q30 SELECT SUM(ResolutionWidth), SUM(ResolutionWidth + 1), … (90 sums) … FROM hits 0.66s 6.72s 10.2x
Q31 SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC LIMIT 10 0.82s 6.52s 8.0x
Q32 SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10 0.97s 7.3s 7.5x
Q33 SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10 3.52s 16.7s 4.7x
Q34 SELECT URL, COUNT(*) AS c FROM hits GROUP BY URL ORDER BY c DESC LIMIT 10 4.91s 12.93s 2.6x
Q35 SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10 4.96s 12.79s 2.6x
Q36 SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10 1.01s 6.97s 6.9x
Q37 SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10 0.14s 3.88s 27.7x
Q38 SELECT Title, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10 0.12s 3.66s 30.5x
Q39 SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000 0.15s 3.7s 24.7x
Q40 SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000 0.23s 5.47s 23.8x
Q41 SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100 0.067s 4.01s 59.9x
Q42 SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000 0.063s 3.87s 61.4x
Q43 SELECT DATE_TRUNC('minute', CAST(EventTime AS TIMESTAMP)) AS M, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', CAST(EventTime AS TIMESTAMP)) ORDER BY DATE_TRUNC('minute', CAST(EventTime AS TIMESTAMP)) LIMIT 10 OFFSET 1000 0.058s 3.92s 67.6x
single-node · c6a.4xlarge · 14 GB Parquet · run dated 2026-05-11
Sail JSON ↗ Spark JSON ↗
Technical Credibility

Verified by people who built the underlying stack.

LakeSail is built on Apache Arrow and Apache DataFusion, the same open-source engine infrastructure that powers a generation of fast analytical systems. The people who built those foundations have reviewed our work.

"

LakeSail embodies the best next generation lakehouse architecture, combining native performance with managed ease of use. A compelling platform for data intensive applications.

Andrew Lamb InfluxData Staff Engineer & Apache DataFusion PMC
Maintains
Why this matters
He helps maintain the engine LakeSail builds on
Andrew Lamb is a core contributor to Apache DataFusion, the vectorized query engine that powers LakeSail's Rust runtime.
Arrow + DataFusion are the foundation
Sail, the open-source engine that powers LakeSail, is built directly on Apache Arrow's columnar memory format and DataFusion's vectorized execution engine. Apache DataFusion tests each of its releases on Sail. Both are ASF projects with thousands of contributors.
Open Methodology

Run it yourself.
Every step is open.

The Sail entry in ClickBench is open and runnable end-to-end. Install script, data loader, queries, and per-run results all live in one directory on GitHub. If you can reproduce a result, you can trust it.

The commands below reproduce the 2026-05-11 c6a.4xlarge run on a fresh Ubuntu host. See benchmark.sh for the canonical flow.

reproduce_clickbench.sh
# Reproduce on AWS c6a.4xlarge, Ubuntu
git clone https://github.com/ClickHouse/ClickBench
cd ClickBench/sail
./install       # Sail + deps (Rust, Python 3.11, pysail)
./load          # no-op: Sail reads hits.parquet directly
./benchmark.sh  # ~14 GB; 43 queries x 3, best of 3 -> result.csv
# Published run: results/20260511/c6a.4xlarge.json
Benchmark environment
Instancec6a.4xlarge
vCPUs16
RAM32 GiB
Datasethits.parquet (~14 GB, 99.99M rows)
Queries43 (all)
Runs3 (best reported)
What the Numbers Mean

From benchmark results
to your actual bill.

Beyond raw speed, the same workload costs dramatically less to run. The numbers below come from a separate derived TPC-H run on a larger dataset, where Sail finished up to 8x faster on roughly 1/4 the instance size.

Step 1. The benchmark result
Up to 8x faster
On a larger dataset than ClickBench, LakeSail's engine completes the same queries on average 4x faster than Apache Spark. Peak query speed-up: 8 times faster.
Step 2. Speed becomes time
¼
Faster execution means your cluster runs for a fraction of the wall-clock time. Combined with LakeSail's engine (which doesn't idle between jobs the way Spark clusters do), active compute time drops sharply.
Step 3. Time becomes cost
94% lower
Cloud compute is billed by the second. Derived from Derived TPC-H benchmark: Sail runs on average 4x faster on 1/4 the instance size, which equates to ~94% lower compute cost.
Step 4. Your workload will vary
Your mileage may differ
Standard benchmarks are not your production environment. Workload mix, query complexity, data volume, and cluster idle patterns all affect your actual savings. The methodology below shows exactly how we derived these numbers.
  • 94% cost reduction is derived from the Derived TPC-H benchmark. It is a benchmark-derived figure, not a guarantee for all workloads.
  • Cost is proportional to compute time. LakeSail's stateless workers scale to zero between jobs; Apache Spark clusters typically remain running between jobs, adding idle cost not captured in raw query runtime.
  • Engineering hours (typically 20–40% of total Spark cost) are not included in the compute estimate. That's additional savings not captured in the benchmark number.
  • Raw data and runnable scripts live in the ClickHouse/ClickBench Sail entry on GitHub.

Ready to see your numbers?

Get a 30-minute demo and see LakeSail running against your Spark workloads.