Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ public static SendStatsPredicate create(PinotConfiguration serverConf, HelixMana

public enum Mode {
/// Sends stats only if all the cluster participants use the same known version.
// ALWAYS is strictly better than SAFE if all servers are already on versions >= 1.4
@Deprecated
SAFE {
@Override
public SendStatsPredicate create(HelixManager helixManager) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2142,16 +2142,8 @@ public static class MultiStageQueryRunner {
/// - "SAFE": MSE will only send stats if all instances in the cluster are running 1.4.0 or later.
/// - "ALWAYS": MSE will always send stats, regardless of the version of the instances in the cluster.
/// - "NEVER": MSE will never send stats.
///
/// The reason for this flag that versions 1.3.0 and lower have two undesired behaviors:
/// 1. Some queries using intersection generate incorrect stats
/// 2. When stats from other nodes are sent but are different from expected, the query fails.
///
/// In 1.4.0 the first issue is solved and instead of failing when unexpected stats are received, the query
/// continues without children stats. But if a query involves servers in versions 1.3.0 and 1.4.0, the one
/// running 1.3.0 may fail, which breaks backward compatibility.
public static final String KEY_OF_SEND_STATS_MODE = "pinot.query.mse.stats.mode";
public static final String DEFAULT_SEND_STATS_MODE = "SAFE";
public static final String DEFAULT_SEND_STATS_MODE = "ALWAYS";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default here bypasses the SAFE compatibility check for every cluster that never set pinot.query.mse.stats.mode. SendStatsPredicate still documents that 1.3.x and lower can return incorrect stats or fail when unexpected upstream stats arrive, so this turns mixed-version rollouts into a behavior-breaking default change. This needs an explicit migration boundary or rollout plan instead of flipping the default constant.

Copy link
Copy Markdown
Contributor Author

@satwik-pachigolla satwik-pachigolla Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a safe way do this as an explicit migration boundary. This requires coordination between nodes on old versions (which we can't change) and new versions. Otherwise we'd need to use more stable mechanisms of relying on ZK metadata that would have existed as of <=1.3, none of which I think are suitable here.

#15890 is a documented case of how using ZK watchers led to more instability and the partial fix PR comments also mention that we should go to default ALWAYS eventually.

I think the existing ZK risk >> the risk of an MSE user upgrading from <= 1.3 to >= 1.5 without seeing this PR if we label it

I updated the PR description to make this more clear.

cc @Jackie-Jiang


/// Used to indicate whether MSE pipeline breaker stats should be included in the queryStats field.
/// This flag was introduced in 1.5.0. Before 1.5.0, MSE pipeline breaker stats were not kept. Starting from 1.5.0,
Expand Down
Loading