Skip to content

Track active spillers in SedonaFairSpillPool#919

Draft
camden-lowrance wants to merge 1 commit into
apache:mainfrom
camden-lowrance:bug/active-spillers-memory-pool
Draft

Track active spillers in SedonaFairSpillPool#919
camden-lowrance wants to merge 1 commit into
apache:mainfrom
camden-lowrance:bug/active-spillers-memory-pool

Conversation

@camden-lowrance

Copy link
Copy Markdown
Contributor

Closes #918.

This changes SedonaFairSpillPool fairness from registered spill-capable consumers to active spillers.

Right now, a spill-capable consumer counts against the fair-share calculation as soon as it is registered, even if it has reserved 0 bytes. In sort-heavy plans, that can make an active sorter fail allocation while the pool still has spillable memory available.

After this change:

  • idle spill-capable consumers do not reduce the active spill budget
  • consumers are counted once they reserve spillable memory
  • consumers stop being counted when their spillable reservation returns to zero
  • fairness still applies among active spillers
  • unspillable reserve behavior is unchanged

Verification:

  • cargo test -p sedona memory_pool --lib

Draft for discussion: this changes fairness from registered spill-capable consumers to active spillers.

@github-actions github-actions Bot requested a review from prantogg June 7, 2026 03:39
@vvvanguards vvvanguards force-pushed the bug/active-spillers-memory-pool branch from c701ba8 to 4e64673 Compare June 7, 2026 03:44
@camden-lowrance camden-lowrance force-pushed the bug/active-spillers-memory-pool branch from 4e64673 to c3716c1 Compare June 7, 2026 03:48
@2010YOUY01

Copy link
Copy Markdown
Contributor

Right now, a spill-capable consumer counts against the fair-share calculation as soon as it is registered, even if it has reserved 0 bytes. In sort-heavy plans, that can make an active sorter fail allocation while the pool still has spillable memory available.

I think this is an intentional design decision, the assumption is idle consumers will be active shortly after.

  • idle spill-capable consumers do not reduce the active spill budget

This is an interesting idea, but I don't fully understand it now.

I suggest to demonstrate some end-to-end queries, that used to fail, but this new policy make them work.

@paleolimbot

Copy link
Copy Markdown
Member

Agreed that managing spilling is not very flexible in DataFusion at the moment! The most comprehensive discussion I've seen so far is apache/datafusion#17334 but there may be other more recent ones that I've missed. We're definitely open to solutions that can mitigate some of that but I am with Yongting that we would need to see some concrete examples (we have a benchmarks/ directory you could add them into for the purposes of this PR).

Something you could also try is using multiple contexts with independent memory limits, squeezing intermediary results through an ArrowArrayStream. The Python equivalent might be:

import sedona.db

sd1 = sedona.db.connect()
# configure memory limit

sd2 = sedona.db.connect()
# configure memory limit

df1 = sd1.sql(...)
df2 = sd2.create_data_frame(df1.arrow()) # just df1 may also work, but would go through an ffi table provider today
df2.sort(...)

Of course, that's not not an ideal/generic workaround but might do for specific workflow.

@camden-lowrance

camden-lowrance commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Thanks I appreciate the discussion, looking through our error traces this has only occured two times on separate datasets. They process successfully on retry, so it seems to be a transient error. I will try to pin down this with a reproducible example.

For some context, our production env is memory constrained, only 4gb of memory per kubernetes pod, but we routinely process data in the 10s of millions of rows of geoparquet.

@paleolimbot

Copy link
Copy Markdown
Member

I will try to pin down this with a reproducible example.

You're kind to offer...this kind of thing can be very difficult to reproduce. In the meantime if you can find a way to dump some metrics about spillable consumers on the failure you're seeing it could give some insights the next time this happens.

For some context, our production env is memory constrained, only 4gb of memory per kubernetes pod, but we routinely process data in the 10s of millions of rows of geoparquet.

That is so cool! This is exactly what we built the spilling feature to be able to do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SedonaFairSpillPool can reject sort memory even when spill memory is available

3 participants