Implement limit push down for IcebergTableProvider #1673
Implement limit push down for IcebergTableProvider #1673krinart wants to merge 14 commits intoapache:mainfrom
IcebergTableProvider #1673Conversation
| record_batch_stream_builder.with_row_groups(selected_row_group_indices); | ||
| } | ||
|
|
||
| if let Some(limit) = task.limit { |
There was a problem hiding this comment.
Should we enable with_page_index as suggested by doc: https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html#method.with_limit
There was a problem hiding this comment.
Thanks! I extended should_load_page_index logic and ArrowReaderOptions is initialized with with_page_index(should_load_page_index).
|
Happy to help @ZENOTME and thanks for the feedback! |
Original PR: #19 Upstream PR: apache#1673
|
Hi. Anything else I can do on my side to get this merged? |
Original PR: #19 Upstream PR: apache#1673
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!
|
hi, @krinart please allow me to edit this PR so we can resolve the conflicts and merge it. |
Original PR: #19 Upstream PR: apache#1673
|
Hey @Xuanwo, apologies for a late reply. I just resolved all the conflicts. Let me know if there's anything else I can do. Thanks! |
|
@Xuanwo anything left to get this merged? |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Which issue does this PR close?
N/A
What changes are included in this PR?
Previously
_limitwas ignored inIcebergTableProvider::scan:iceberg-rust/crates/integrations/datafusion/src/table/mod.rs
Lines 149 to 163 in aad9e2e
This PR propagates limit all the way to the
ArrowReaderBuilder.Note: limit push down is only applied to each batch which means that
IcebergTableProvider::scanmay potentially return more records than specified by limit.Which is OK according to
TableProvider::scandocumentation:Are these changes tested?
Unit tests