Track active spillers in SedonaFairSpillPool#919
Conversation
c701ba8 to
4e64673
Compare
4e64673 to
c3716c1
Compare
I think this is an intentional design decision, the assumption is idle consumers will be active shortly after.
This is an interesting idea, but I don't fully understand it now. I suggest to demonstrate some end-to-end queries, that used to fail, but this new policy make them work. |
|
Agreed that managing spilling is not very flexible in DataFusion at the moment! The most comprehensive discussion I've seen so far is apache/datafusion#17334 but there may be other more recent ones that I've missed. We're definitely open to solutions that can mitigate some of that but I am with Yongting that we would need to see some concrete examples (we have a benchmarks/ directory you could add them into for the purposes of this PR). Something you could also try is using multiple contexts with independent memory limits, squeezing intermediary results through an ArrowArrayStream. The Python equivalent might be: import sedona.db
sd1 = sedona.db.connect()
# configure memory limit
sd2 = sedona.db.connect()
# configure memory limit
df1 = sd1.sql(...)
df2 = sd2.create_data_frame(df1.arrow()) # just df1 may also work, but would go through an ffi table provider today
df2.sort(...)Of course, that's not not an ideal/generic workaround but might do for specific workflow. |
|
Thanks I appreciate the discussion, looking through our error traces this has only occured two times on separate datasets. They process successfully on retry, so it seems to be a transient error. I will try to pin down this with a reproducible example. For some context, our production env is memory constrained, only 4gb of memory per kubernetes pod, but we routinely process data in the 10s of millions of rows of geoparquet. |
You're kind to offer...this kind of thing can be very difficult to reproduce. In the meantime if you can find a way to dump some metrics about spillable consumers on the failure you're seeing it could give some insights the next time this happens.
That is so cool! This is exactly what we built the spilling feature to be able to do! |
Closes #918.
This changes
SedonaFairSpillPoolfairness from registered spill-capable consumers to active spillers.Right now, a spill-capable consumer counts against the fair-share calculation as soon as it is registered, even if it has reserved 0 bytes. In sort-heavy plans, that can make an active sorter fail allocation while the pool still has spillable memory available.
After this change:
Verification:
cargo test -p sedona memory_pool --libDraft for discussion: this changes fairness from registered spill-capable consumers to active spillers.