Support temporal bucketing in Union and Reduce operators#36648
Draft
antiguru wants to merge 7 commits into
Draft
Support temporal bucketing in Union and Reduce operators#36648antiguru wants to merge 7 commits into
antiguru wants to merge 7 commits into
Conversation
The `Union { consolidate_output: true }` arm previously fed the
concatenated stream directly into `consolidate_named::<KeyBatcher>`.
Future-dated updates therefore accumulated in the consolidate operator
until the input frontier caught up — exactly the situation
`BucketChain` was introduced to avoid in the `ArrangeBy` lowering.
Apply `MaybeBucketByTime::maybe_apply_temporal_bucketing` to the
concatenated stream before the consolidate, gated on
`ENABLE_COMPUTE_TEMPORAL_BUCKETING`. The trait is a no-op for partially
ordered timestamps (e.g. inside iterative scopes), so this only does
real work in non-iterative scopes where `BucketChain` is meaningful.
Fixes CLU-86.
The `Union { consolidate_output: true }` arm previously fed the
concatenated stream directly into `consolidate_named::<KeyBatcher>`.
Future-dated updates therefore accumulated in the consolidate operator
until the input frontier caught up — the situation `BucketChain` was
introduced to avoid in the `ArrangeBy` lowering.
Track `has_future_updates` per Union input through lowering and surface
it as `input_has_future_updates: Vec<bool>` on `PlanNode::Union` (and
the corresponding `RenderPlan` variant). The renderer applies
`MaybeBucketByTime::maybe_apply_temporal_bucketing` only to those
specific inputs that may carry future updates, and only when
`consolidate_output` is set and `ENABLE_COMPUTE_TEMPORAL_BUCKETING` is
on. Inputs that the lowering knows cannot carry future-stamped updates
pay no bucketing cost.
Fixes CLU-86.
The previous commit attached a `Vec<bool>` "has future updates" flag to `PlanNode::Union`. That conflates an input property with a rendering decision and forces the renderer to translate "is future" into "should bucket" — the same translation the lowering already does for `ArrangeBy`. Reuse `ArrangementStrategy` per Union input. The lowering runs each input's `has_future_updates` through `strategy_from_future`, so the plan carries `Direct` / `TemporalBucketing` — what the renderer should do, not what is true of the input. The renderer simply matches on the strategy, mirroring `ArrangeBy`. `ArrangementStrategy`'s docstring is broadened to cover both consumers.
`build_monotonic` (`render/reduce.rs:1164`) feeds its `consolidate_named_if::<KeyBatcher>` without temporal bucketing. Future-stamped updates (e.g., from a temporal MFP feeding into a monotonic hierarchical reduction with `must_consolidate=true`) therefore accumulate in the batcher until the input frontier catches up — the same gap CLU-86 fixes for Union. Carry the rendering decision on the LIR `Reduce` node as `input_strategy: ArrangementStrategy`, mirroring `ArrangeBy::strategy` and the new Union `input_strategies`. The lowering sets it via `strategy_from_future(input_future)`. The renderer threads it through `render_reduce` → `render_reduce_plan` → `render_reduce_plan_inner` and `build_monotonic` applies `MaybeBucketByTime` ahead of the consolidate when the strategy is `TemporalBucketing`, `must_consolidate` is set, and `ENABLE_COMPUTE_TEMPORAL_BUCKETING` is on. Generalise `MaybeBucketByTime::maybe_apply_temporal_bucketing` over the data type so the monotonic reduce can bucket its `(Row, Vec<Row>)` stream; existing `Row` callers (Union, ArrangeBy) keep working via type inference.
`MzData + Data` alone is insufficient — the inner `bucket` impl also requires `timely::ExchangeData` (for the `Exchange` PACT) and `Hashable` (for `d.hashed()`). Use `MzData + ExchangeData + Hashable`, which folds `Ord + Clone + Debug + 'static` into `differential_dataflow::ExchangeData`.
`build_monotonic` in reduce and the two `consolidate_named_if` sites in
top-k all sit in front of `KeyBatcher` consolidates that fire on the
single-time refinement path (`refine_single_time_operator_selection`).
That path upgrades any `Basic`/`Bucketed` to a monotonic variant with
`must_consolidate=true`, including plans whose MIR Filters carry
temporal predicates. Future-stamped updates therefore pile up in the
batcher until the input frontier catches up — the same gap as Union
and the previous Reduce fix.
Add `input_strategy: ArrangementStrategy` to `PlanNode::TopK`,
populated by the lowering via `strategy_from_future`. Thread it into
`render_topk` and `render_top1_monotonic`.
Replace the `consolidate_named_if(must_consolidate, name)` calls in
`build_monotonic`, the `MonotonicTopK` arm of `render_topk`, and
`render_top1_monotonic` with explicit `if must_consolidate { ... }`.
That removes the bool-passing API at the call site and gives the
bucketing a natural place to live inside the same branch.
Share the bucketing logic via a new
`Context::bucket_for_consolidate` helper, used now by Union, Reduce,
and both TopK paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This change extends temporal bucketing support to
UnionandReduceoperators, allowing future-stamped updates (e.g., frommz_now()MFPs) to be delayed until their bucket boundary releases them. This reduces memory pressure by preventing such updates from accumulating in consolidation batchers until the input frontier catches up.Description
The changes introduce an
ArrangementStrategyfield to bothUnionandReduceplan nodes, which the renderer consults to decide whether to apply temporal bucketing:For
Unionoperators:input_strategies: Vec<ArrangementStrategy>field, aligned withinputsconsolidate_outputis true and temporal bucketing is enabled, the renderer applies bucketing to inputs marked withTemporalBucketingstrategy before concatenationconsolidate_outputFor
Reduceoperators:input_strategy: ArrangementStrategyfieldmust_consolidateset), the renderer applies bucketing before the consolidate if the strategy indicates itKeyBatcherSupporting changes:
MaybeBucketByTime::maybe_apply_temporal_bucketinggeneric over the data typeD(previously hardcoded toRow), allowing it to work with different collection typesinput_strategiesbased on whether inputs have future updates viastrategy_from_future()The implementation respects the
ENABLE_COMPUTE_TEMPORAL_BUCKETINGdynamic config and usesTEMPORAL_BUCKETING_SUMMARYto determine bucket boundaries.Verification
The changes are primarily structural additions to the plan representation and rendering logic. Existing tests should continue to pass as the new fields are properly threaded through all plan traversal code. The temporal bucketing behavior itself is gated behind the
ENABLE_COMPUTE_TEMPORAL_BUCKETINGconfig, so it won't affect default behavior until explicitly enabled.https://claude.ai/code/session_01XsGDMKZricZbyiB67npsNG