Track active spillers in SedonaFairSpillPool by camden-lowrance · Pull Request #919 · apache/sedona-db

camden-lowrance · 2026-06-07T03:39:18Z

Closes #918.

This changes SedonaFairSpillPool fairness from registered spill-capable consumers to active spillers.

Right now, a spill-capable consumer counts against the fair-share calculation as soon as it is registered, even if it has reserved 0 bytes. In sort-heavy plans, that can make an active sorter fail allocation while the pool still has spillable memory available.

After this change:

idle spill-capable consumers do not reduce the active spill budget
consumers are counted once they reserve spillable memory
consumers stop being counted when their spillable reservation returns to zero
fairness still applies among active spillers
unspillable reserve behavior is unchanged

Verification:

cargo test -p sedona memory_pool --lib

Draft for discussion: this changes fairness from registered spill-capable consumers to active spillers.

2010YOUY01 · 2026-06-08T01:11:19Z

Right now, a spill-capable consumer counts against the fair-share calculation as soon as it is registered, even if it has reserved 0 bytes. In sort-heavy plans, that can make an active sorter fail allocation while the pool still has spillable memory available.

I think this is an intentional design decision, the assumption is idle consumers will be active shortly after.

idle spill-capable consumers do not reduce the active spill budget

This is an interesting idea, but I don't fully understand it now.

I suggest to demonstrate some end-to-end queries, that used to fail, but this new policy make them work.

paleolimbot · 2026-06-08T14:48:57Z

Agreed that managing spilling is not very flexible in DataFusion at the moment! The most comprehensive discussion I've seen so far is apache/datafusion#17334 but there may be other more recent ones that I've missed. We're definitely open to solutions that can mitigate some of that but I am with Yongting that we would need to see some concrete examples (we have a benchmarks/ directory you could add them into for the purposes of this PR).

Something you could also try is using multiple contexts with independent memory limits, squeezing intermediary results through an ArrowArrayStream. The Python equivalent might be:

import sedona.db

sd1 = sedona.db.connect()
# configure memory limit

sd2 = sedona.db.connect()
# configure memory limit

df1 = sd1.sql(...)
df2 = sd2.create_data_frame(df1.arrow()) # just df1 may also work, but would go through an ffi table provider today
df2.sort(...)

Of course, that's not not an ideal/generic workaround but might do for specific workflow.

camden-lowrance · 2026-06-09T02:21:54Z

Thanks I appreciate the discussion, looking through our error traces this has only occured two times on separate datasets. They process successfully on retry, so it seems to be a transient error. I will try to pin down this with a reproducible example.

For some context, our production env is memory constrained, only 4gb of memory per kubernetes pod, but we routinely process data in the 10s of millions of rows of geoparquet.

paleolimbot · 2026-06-09T14:08:00Z

I will try to pin down this with a reproducible example.

You're kind to offer...this kind of thing can be very difficult to reproduce. In the meantime if you can find a way to dump some metrics about spillable consumers on the failure you're seeing it could give some insights the next time this happens.

For some context, our production env is memory constrained, only 4gb of memory per kubernetes pod, but we routinely process data in the 10s of millions of rows of geoparquet.

That is so cool! This is exactly what we built the spilling feature to be able to do!

github-actions Bot requested a review from prantogg June 7, 2026 03:39

vvvanguards force-pushed the bug/active-spillers-memory-pool branch from c701ba8 to 4e64673 Compare June 7, 2026 03:44

Track active spillers in memory pool

c3716c1

camden-lowrance force-pushed the bug/active-spillers-memory-pool branch from 4e64673 to c3716c1 Compare June 7, 2026 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track active spillers in SedonaFairSpillPool#919

Track active spillers in SedonaFairSpillPool#919
camden-lowrance wants to merge 1 commit into
apache:mainfrom
camden-lowrance:bug/active-spillers-memory-pool

camden-lowrance commented Jun 7, 2026

Uh oh!

2010YOUY01 commented Jun 8, 2026

Uh oh!

paleolimbot commented Jun 8, 2026

Uh oh!

camden-lowrance commented Jun 9, 2026 •

edited

Loading

Uh oh!

paleolimbot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

camden-lowrance commented Jun 7, 2026

Uh oh!

2010YOUY01 commented Jun 8, 2026

Uh oh!

paleolimbot commented Jun 8, 2026

Uh oh!

camden-lowrance commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camden-lowrance commented Jun 9, 2026 •

edited

Loading