Skip to content

Multi-stage leaf routing: optional segments, unavailable metadata, and empty partitions#18390

Open
rsrkpatwari1234 wants to merge 9 commits intoapache:masterfrom
rsrkpatwari1234:rsrkpatwari1234-18223
Open

Multi-stage leaf routing: optional segments, unavailable metadata, and empty partitions#18390
rsrkpatwari1234 wants to merge 9 commits intoapache:masterfrom
rsrkpatwari1234:rsrkpatwari1234-18223

Conversation

@rsrkpatwari1234
Copy link
Copy Markdown
Contributor

Fixes #18223

Summary

Multi-stage query planning used to ignore or mishandle a few real-world routing situations: optional segments (segments the broker may still send so servers can decide what to do), unavailable segments (known missing pieces of the table), partitions with no data, and workers that end up with no segments at all.

What changed

The planner now carries optional segments per worker, records unavailable segments in metadata when routing provides them, and handles empty partitions and empty routing by assigning a worker with empty segment lists instead of failing. Dispatch sends optional segments on leaf requests, and the leaf operator forwards empty results that still have a schema so later stages stay consistent.

Test Plan

Added unit tests

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 1, 2026

Codecov Report

❌ Patch coverage is 38.86010% with 118 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.43%. Comparing base (a4bef7b) to head (b46a4ed).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/pinot/query/routing/WorkerManager.java 33.56% 86 Missing and 9 partials ⚠️
...ry/runtime/plan/server/ServerPlanRequestUtils.java 36.36% 12 Missing and 2 partials ⚠️
...uery/planner/physical/DispatchablePlanContext.java 58.33% 4 Missing and 1 partial ⚠️
...ery/planner/physical/DispatchablePlanMetadata.java 66.66% 2 Missing ⚠️
...org/apache/pinot/query/routing/WorkerMetadata.java 71.42% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18390      +/-   ##
============================================
- Coverage     63.48%   63.43%   -0.05%     
  Complexity     1701     1701              
============================================
  Files          3254     3254              
  Lines        199114   199281     +167     
  Branches      30833    30875      +42     
============================================
+ Hits         126399   126418      +19     
- Misses        62643    62770     +127     
- Partials      10072    10093      +21     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.43% <38.86%> (-0.05%) ⬇️
temurin 63.43% <38.86%> (-0.05%) ⬇️
unittests 63.43% <38.86%> (-0.05%) ⬇️
unittests1 55.39% <38.86%> (-0.05%) ⬇️
unittests2 34.96% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya added multi-stage Related to the multi-stage query engine enhancement Improvement to existing functionality labels May 1, 2026
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal issue; see inline comment.

private void assignNonPartitionedLeafWorkersWhenNoServersHaveSegments(DispatchablePlanMetadata metadata,
String tableName, Map<String, RoutingTable> routingTableMap, DispatchablePlanContext context,
@Nullable TimeBoundaryInfo timeBoundaryInfo) {
Map<String, ServerInstance> routableServers = _routingManager.getRoutableServerInstanceMap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty-routing fallback is pulling from the global routable-server pool instead of a table-scoped candidate set. On a multi-tenant cluster that can place an empty-table leaf on a server that does not host this table, and the leaf then fails in constructServerQueryRequests() when instanceDataManager.getTableDataManager(table) returns null. The fallback needs to stay within servers that actually serve the table; the same global-pool issue also shows up again in the empty-partition fallbacks below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Close WorkerManager routing gaps for optional/unavailable segments and empty partitions

4 participants