Skip to content

Support temporal bucketing in Union and Reduce operators#36648

Draft
antiguru wants to merge 7 commits into
mainfrom
claude/fix-clu-86-SWx5I
Draft

Support temporal bucketing in Union and Reduce operators#36648
antiguru wants to merge 7 commits into
mainfrom
claude/fix-clu-86-SWx5I

Conversation

@antiguru
Copy link
Copy Markdown
Member

Motivation

This change extends temporal bucketing support to Union and Reduce operators, allowing future-stamped updates (e.g., from mz_now() MFPs) to be delayed until their bucket boundary releases them. This reduces memory pressure by preventing such updates from accumulating in consolidation batchers until the input frontier catches up.

Description

The changes introduce an ArrangementStrategy field to both Union and Reduce plan nodes, which the renderer consults to decide whether to apply temporal bucketing:

For Union operators:

  • Added input_strategies: Vec<ArrangementStrategy> field, aligned with inputs
  • When consolidate_output is true and temporal bucketing is enabled, the renderer applies bucketing to inputs marked with TemporalBucketing strategy before concatenation
  • Bucketing is only meaningful when consolidation follows, so it's gated on consolidate_output

For Reduce operators:

  • Added input_strategy: ArrangementStrategy field
  • When the reduce performs pre-aggregation consolidation (monotonic hierarchical reductions with must_consolidate set), the renderer applies bucketing before the consolidate if the strategy indicates it
  • This prevents future-stamped updates from piling up in the KeyBatcher

Supporting changes:

  • Made MaybeBucketByTime::maybe_apply_temporal_bucketing generic over the data type D (previously hardcoded to Row), allowing it to work with different collection types
  • Updated the lowering logic to populate input_strategies based on whether inputs have future updates via strategy_from_future()
  • Updated all plan traversal code (explain, interpret, render_plan) to handle the new fields

The implementation respects the ENABLE_COMPUTE_TEMPORAL_BUCKETING dynamic config and uses TEMPORAL_BUCKETING_SUMMARY to determine bucket boundaries.

Verification

The changes are primarily structural additions to the plan representation and rendering logic. Existing tests should continue to pass as the new fields are properly threaded through all plan traversal code. The temporal bucketing behavior itself is gated behind the ENABLE_COMPUTE_TEMPORAL_BUCKETING config, so it won't affect default behavior until explicitly enabled.

https://claude.ai/code/session_01XsGDMKZricZbyiB67npsNG

claude added 7 commits May 18, 2026 11:23
The `Union { consolidate_output: true }` arm previously fed the
concatenated stream directly into `consolidate_named::<KeyBatcher>`.
Future-dated updates therefore accumulated in the consolidate operator
until the input frontier caught up — exactly the situation
`BucketChain` was introduced to avoid in the `ArrangeBy` lowering.

Apply `MaybeBucketByTime::maybe_apply_temporal_bucketing` to the
concatenated stream before the consolidate, gated on
`ENABLE_COMPUTE_TEMPORAL_BUCKETING`. The trait is a no-op for partially
ordered timestamps (e.g. inside iterative scopes), so this only does
real work in non-iterative scopes where `BucketChain` is meaningful.

Fixes CLU-86.
The `Union { consolidate_output: true }` arm previously fed the
concatenated stream directly into `consolidate_named::<KeyBatcher>`.
Future-dated updates therefore accumulated in the consolidate operator
until the input frontier caught up — the situation `BucketChain` was
introduced to avoid in the `ArrangeBy` lowering.

Track `has_future_updates` per Union input through lowering and surface
it as `input_has_future_updates: Vec<bool>` on `PlanNode::Union` (and
the corresponding `RenderPlan` variant). The renderer applies
`MaybeBucketByTime::maybe_apply_temporal_bucketing` only to those
specific inputs that may carry future updates, and only when
`consolidate_output` is set and `ENABLE_COMPUTE_TEMPORAL_BUCKETING` is
on. Inputs that the lowering knows cannot carry future-stamped updates
pay no bucketing cost.

Fixes CLU-86.
The previous commit attached a `Vec<bool>` "has future updates" flag to
`PlanNode::Union`. That conflates an input property with a rendering
decision and forces the renderer to translate "is future" into "should
bucket" — the same translation the lowering already does for
`ArrangeBy`.

Reuse `ArrangementStrategy` per Union input. The lowering runs each
input's `has_future_updates` through `strategy_from_future`, so the
plan carries `Direct` / `TemporalBucketing` — what the renderer should
do, not what is true of the input. The renderer simply matches on the
strategy, mirroring `ArrangeBy`. `ArrangementStrategy`'s docstring is
broadened to cover both consumers.
`build_monotonic` (`render/reduce.rs:1164`) feeds its
`consolidate_named_if::<KeyBatcher>` without temporal bucketing.
Future-stamped updates (e.g., from a temporal MFP feeding into a
monotonic hierarchical reduction with `must_consolidate=true`)
therefore accumulate in the batcher until the input frontier catches
up — the same gap CLU-86 fixes for Union.

Carry the rendering decision on the LIR `Reduce` node as
`input_strategy: ArrangementStrategy`, mirroring `ArrangeBy::strategy`
and the new Union `input_strategies`. The lowering sets it via
`strategy_from_future(input_future)`. The renderer threads it through
`render_reduce` → `render_reduce_plan` → `render_reduce_plan_inner`
and `build_monotonic` applies `MaybeBucketByTime` ahead of the
consolidate when the strategy is `TemporalBucketing`,
`must_consolidate` is set, and `ENABLE_COMPUTE_TEMPORAL_BUCKETING` is
on.

Generalise `MaybeBucketByTime::maybe_apply_temporal_bucketing` over
the data type so the monotonic reduce can bucket its
`(Row, Vec<Row>)` stream; existing `Row` callers (Union, ArrangeBy)
keep working via type inference.
`MzData + Data` alone is insufficient — the inner `bucket` impl also
requires `timely::ExchangeData` (for the `Exchange` PACT) and
`Hashable` (for `d.hashed()`). Use `MzData + ExchangeData + Hashable`,
which folds `Ord + Clone + Debug + 'static` into
`differential_dataflow::ExchangeData`.
`build_monotonic` in reduce and the two `consolidate_named_if` sites in
top-k all sit in front of `KeyBatcher` consolidates that fire on the
single-time refinement path (`refine_single_time_operator_selection`).
That path upgrades any `Basic`/`Bucketed` to a monotonic variant with
`must_consolidate=true`, including plans whose MIR Filters carry
temporal predicates. Future-stamped updates therefore pile up in the
batcher until the input frontier catches up — the same gap as Union
and the previous Reduce fix.

Add `input_strategy: ArrangementStrategy` to `PlanNode::TopK`,
populated by the lowering via `strategy_from_future`. Thread it into
`render_topk` and `render_top1_monotonic`.

Replace the `consolidate_named_if(must_consolidate, name)` calls in
`build_monotonic`, the `MonotonicTopK` arm of `render_topk`, and
`render_top1_monotonic` with explicit `if must_consolidate { ... }`.
That removes the bool-passing API at the call site and gives the
bucketing a natural place to live inside the same branch.

Share the bucketing logic via a new
`Context::bucket_for_consolidate` helper, used now by Union, Reduce,
and both TopK paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants