Skip to content

ZK multi path transaction API support#18380

Open
krishan1390 wants to merge 3 commits intoapache:masterfrom
krishan1390:zk_multi_path_api
Open

ZK multi path transaction API support#18380
krishan1390 wants to merge 3 commits intoapache:masterfrom
krishan1390:zk_multi_path_api

Conversation

@krishan1390
Copy link
Copy Markdown
Contributor

@krishan1390 krishan1390 commented Apr 30, 2026

Summary

Adds a fluent ZkMultiWriteBuilder API for issuing atomic ZooKeeper multi() transactions over Helix property-store paths, exposed via PinotHelixResourceManager.multiWriteZK(). This lets callers mutate multiple property-store znodes (e.g. several SegmentZKMetadata entries ) as a single all-or-nothing batch instead of issuing per-path writes that can leave the property store in a half-applied state on a crash or version-mismatch retry.

helixResourceManager.multiWriteZK()
    .set(segmentPath1, seg1.toZNRecord(), expectedVersion1)
    .set(segmentPath2, seg2.toZNRecord(), expectedVersion2)
    .check(guardPath, guardVersion)
    .execute();   // throws KeeperException on atomic rollback

What's in the API

  • set(path, record[, expectedVersion]) — overwrite a znode with optional CAS.
  • create(path, record) — create a persistent znode.
  • delete(path[, expectedVersion]) — delete with optional CAS.
  • check(path, expectedVersion) — version assertion that gates the rest of the batch.
  • execute() — submits all accumulated ops as a single ZK multi().

Op paths are property-store-relative (the same paths callers pass to ZkHelixPropertyStore). The builder prepends /{cluster}/PROPERTYSTORE before submitting. Multi-path writes outside the property store are intentionally unsupported.

Error model

On atomic rollback, execute() throws the underlying KeeperException subtype so callers can branch cleanly:

  • BadVersionException / NoNodeException / NodeExistsException → retryable concurrent-state changes.
  • Other subtypes → hard errors.

Per-op offender info is reachable via KeeperException#getResults(). Connectivity / session failures (timeout, interrupt, session expiry) propagate as the original ZkException since they are not atomic outcomes. The builder is single-use and not thread-safe — obtain a fresh one per transaction.

Why a dedicated ZkClient

Helix 1.3.2 does not expose multi() on BaseDataAccessor, and the ZkClient inside ZKHelixManager is not publicly reachable. Reusing it would require reflection, which breaks across Helix point-releases. PinotHelixResourceManager lazily builds one dedicated ZkClient on first multiWriteZK() call (derived from the started Helix manager's ZK address) and closes it asynchronously on stop(). This is consistent with the controller's existing footprint — _propertyStore and _leadControllerManager already hold their own ZK sessions.

Test plan

  • Added ZkMultiWriteBuilderTest covering ops, fluent chaining, single-use guard, path-prefix validation, and KeeperException unwrapping from ZkException.
  • Added PinotHelixResourceManagerStatelessTest#testMultiWriteZkSegmentMetadataUpdates — pre-creates two SegmentZKMetadata znodes, updates both atomically via multiWriteZK(), and verifies round-trip read through the property store.
  • mvn spotless:apply && mvn checkstyle:check

krishan1390 and others added 2 commits April 23, 2026 17:18
Replace PinotZkOp + PinotZkMultiResult + ZkMultiWriter with a single
ZkMultiWriteBuilder; PinotHelixResourceManager.multiWriteZK() returns a
fresh builder. execute() returns void and throws KeeperException on
atomic rollback, so callers branch on BadVersionException /
NoNodeException / NodeExistsException for the retry-vs-hard-error
trichotomy without a custom result type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 87.50000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.46%. Comparing base (91ed730) to head (80f1d9a).
⚠️ Report is 90 commits behind head on master.

Files with missing lines Patch % Lines
...ntroller/helix/core/PinotHelixResourceManager.java 79.31% 3 Missing and 3 partials ⚠️
.../pinot/common/utils/helix/ZkMultiWriteBuilder.java 93.02% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18380      +/-   ##
============================================
- Coverage     63.53%   63.46%   -0.07%     
- Complexity     1658     1698      +40     
============================================
  Files          3244     3254      +10     
  Lines        197440   199042    +1602     
  Branches      30557    30821     +264     
============================================
+ Hits         125437   126319     +882     
- Misses        61959    62650     +691     
- Partials      10044    10073      +29     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 ?
java-21 63.46% <87.50%> (-0.04%) ⬇️
temurin 63.46% <87.50%> (-0.07%) ⬇️
unittests 63.46% <87.50%> (-0.07%) ⬇️
unittests1 55.44% <93.02%> (-0.11%) ⬇️
unittests2 35.02% <62.50%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Switch ZkMultiWriteBuilder ctor to concrete ZkClient (matches the
  rest of Pinot, which uses ZkClient over RealmAwareZkClient).
- Document the builder is single-use and not thread-safe.
- Restrict multiWriteZK to Helix property-store paths: builder takes a
  propertyStoreRoot prefix and op paths are property-store-relative
  (the same paths callers pass to ZkHelixPropertyStore). Validates
  prefix shape in the constructor.
- Simplify PinotHelixResourceManager multi-write client setup: derive
  zkAddress from the started Helix manager instead of caching it +
  drop unused _stopped flag.
- Add PinotHelixResourceManagerStatelessTest#testMultiWriteZkSegment
  MetadataUpdates: pre-creates two SegmentZKMetadata znodes, updates
  both atomically via multiWriteZK(), and verifies round-trip read
  through the property store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@krishan1390 krishan1390 marked this pull request as ready for review April 30, 2026 08:38
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging one backward-compatibility risk in multiWriteZK() root selection.

*/
public ZkMultiWriteBuilder multiWriteZK() {
return new ZkMultiWriteBuilder(getOrBuildMultiWriteZkClient(),
PropertyPathBuilder.propertyStore(_helixClusterName));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding PropertyPathBuilder.propertyStore(_helixClusterName) here drops the same fallback behavior that the rest of Pinot gets from Helix's property store for legacy /<cluster>/HELIX_PROPERTYSTORE roots. On an upgraded cluster that still stores metadata under the legacy subtree, multiWriteZK().set(...) will start failing with NoNodeException, and create(...) can silently write shadow znodes under /<cluster>/PROPERTYSTORE that the normal controller reads never see. This public API needs to derive the actual active property-store root or preserve the same fallback semantics before it is safe to expose.

Copy link
Copy Markdown
Contributor Author

@krishan1390 krishan1390 May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look.

The concerns are true, but it's not a regression: existing _propertyStore.set(...) and PinotHelixResourceManager.setZKData(...) already bypass the auto-promote that only AutoFallbackPropertyStore.update triggers, so they have the same hazard on a legacy-root cluster. The only Pinot reference to HELIX_PROPERTYSTORE anywhere in the codebase is the read-side PredownloadZKClient; no modern Pinot cluster runs against the legacy root.

Given that this API doesn't introduce a failure mode beyond what existing direct-set paths already have on the same hypothetical legacy cluster, I'd rather not expand the surface area with fallback-aware logic for a path with no real callers.

Let me know if you think its still important to add

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants