Skip to content

Default Stats Send Mode SAFE -> ALWAYS#18367

Open
satwik-pachigolla wants to merge 3 commits intoapache:masterfrom
satwik-pachigolla:patch-1
Open

Default Stats Send Mode SAFE -> ALWAYS#18367
satwik-pachigolla wants to merge 3 commits intoapache:masterfrom
satwik-pachigolla:patch-1

Conversation

@satwik-pachigolla
Copy link
Copy Markdown
Contributor

@satwik-pachigolla satwik-pachigolla commented Apr 29, 2026

Summary

  • Pinot 1.5 is out
  • Pinot clusters should not be upgraded more than one minor version at a time, as in they should not go from < 1.4 -> > 1.4 and must upgrade through 1.4 first
  • ALWAYS is strictly better than SAFE unless involving servers < 1.4

Improvements

This will further mitigate #15890 which was only partially mitigated in #15895 (comment) where there is still the risk of ZK instability risk for big clusters where operators haven't come across this setting.

Detailed Explanation

AI powered summary here.

In short,

ALWAYS with no break: All queries optimized > SAFE (no stats)
ALWAYS with break: Average(optimized, slow) > SAFE (no stats)
So yes, ALWAYS is strictly better than SAFE in 1.4.0+ even during a rolling upgrade!

Compatability

  • Unsafe to upgrade from pinot servers with versions < 1.4 (this commit is on 1.5), doing so may lead to query errors during the upgrade
  • Safe to upgrade from pinot servers with version >= 1.4
  • This only changes the default, avoiding affecting any explicit configurations.

@satwik-pachigolla satwik-pachigolla marked this pull request as ready for review April 29, 2026 03:43
@satwik-pachigolla
Copy link
Copy Markdown
Contributor Author

cc @dang-stripe

@satwik-pachigolla
Copy link
Copy Markdown
Contributor Author

@gortiz @suvodeep-pyne please add the upgrade-incompat label to signal just in case someone does intend to upgrade more than one minor version at once

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.42%. Comparing base (5b0b38c) to head (082bc3b).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...apache/pinot/query/runtime/SendStatsPredicate.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18367      +/-   ##
============================================
- Coverage     63.43%   63.42%   -0.01%     
  Complexity     1683     1683              
============================================
  Files          3253     3253              
  Lines        198841   198842       +1     
  Branches      30795    30795              
============================================
- Hits         126136   126124      -12     
- Misses        62625    62643      +18     
+ Partials      10080    10075       -5     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.42% <0.00%> (-0.01%) ⬇️
temurin 63.42% <0.00%> (-0.01%) ⬇️
unittests 63.42% <0.00%> (-0.01%) ⬇️
unittests1 55.36% <0.00%> (-0.01%) ⬇️
unittests2 35.00% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal compatibility issue; see inline comment.

/// running 1.3.0 may fail, which breaks backward compatibility.
public static final String KEY_OF_SEND_STATS_MODE = "pinot.query.mse.stats.mode";
public static final String DEFAULT_SEND_STATS_MODE = "SAFE";
public static final String DEFAULT_SEND_STATS_MODE = "ALWAYS";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default here bypasses the SAFE compatibility check for every cluster that never set pinot.query.mse.stats.mode. SendStatsPredicate still documents that 1.3.x and lower can return incorrect stats or fail when unexpected upstream stats arrive, so this turns mixed-version rollouts into a behavior-breaking default change. This needs an explicit migration boundary or rollout plan instead of flipping the default constant.

Copy link
Copy Markdown
Contributor Author

@satwik-pachigolla satwik-pachigolla Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a safe way do this as an explicit migration boundary. This requires coordination between nodes on old versions (which we can't change) and new versions. Otherwise we'd need to use more stable mechanisms of relying on ZK metadata that would have existed as of <=1.3, none of which I think are suitable here.

#15890 is a documented case of how using ZK watchers led to more instability and the partial fix PR comments also mention that we should go to default ALWAYS eventually.

I think the existing ZK risk >> the risk of an MSE user upgrading from <= 1.3 to >= 1.5 without seeing this PR if we label it

I updated the PR description to make this more clear.

cc @Jackie-Jiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants