ZK multi path transaction API support#18380
ZK multi path transaction API support#18380krishan1390 wants to merge 3 commits intoapache:masterfrom
Conversation
Replace PinotZkOp + PinotZkMultiResult + ZkMultiWriter with a single ZkMultiWriteBuilder; PinotHelixResourceManager.multiWriteZK() returns a fresh builder. execute() returns void and throws KeeperException on atomic rollback, so callers branch on BadVersionException / NoNodeException / NodeExistsException for the retry-vs-hard-error trichotomy without a custom result type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18380 +/- ##
============================================
- Coverage 63.53% 63.46% -0.07%
- Complexity 1658 1698 +40
============================================
Files 3244 3254 +10
Lines 197440 199042 +1602
Branches 30557 30821 +264
============================================
+ Hits 125437 126319 +882
- Misses 61959 62650 +691
- Partials 10044 10073 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Switch ZkMultiWriteBuilder ctor to concrete ZkClient (matches the rest of Pinot, which uses ZkClient over RealmAwareZkClient). - Document the builder is single-use and not thread-safe. - Restrict multiWriteZK to Helix property-store paths: builder takes a propertyStoreRoot prefix and op paths are property-store-relative (the same paths callers pass to ZkHelixPropertyStore). Validates prefix shape in the constructor. - Simplify PinotHelixResourceManager multi-write client setup: derive zkAddress from the started Helix manager instead of caching it + drop unused _stopped flag. - Add PinotHelixResourceManagerStatelessTest#testMultiWriteZkSegment MetadataUpdates: pre-creates two SegmentZKMetadata znodes, updates both atomically via multiWriteZK(), and verifies round-trip read through the property store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xiangfu0
left a comment
There was a problem hiding this comment.
Flagging one backward-compatibility risk in multiWriteZK() root selection.
| */ | ||
| public ZkMultiWriteBuilder multiWriteZK() { | ||
| return new ZkMultiWriteBuilder(getOrBuildMultiWriteZkClient(), | ||
| PropertyPathBuilder.propertyStore(_helixClusterName)); |
There was a problem hiding this comment.
Hard-coding PropertyPathBuilder.propertyStore(_helixClusterName) here drops the same fallback behavior that the rest of Pinot gets from Helix's property store for legacy /<cluster>/HELIX_PROPERTYSTORE roots. On an upgraded cluster that still stores metadata under the legacy subtree, multiWriteZK().set(...) will start failing with NoNodeException, and create(...) can silently write shadow znodes under /<cluster>/PROPERTYSTORE that the normal controller reads never see. This public API needs to derive the actual active property-store root or preserve the same fallback semantics before it is safe to expose.
There was a problem hiding this comment.
Took a closer look.
The concerns are true, but it's not a regression: existing _propertyStore.set(...) and PinotHelixResourceManager.setZKData(...) already bypass the auto-promote that only AutoFallbackPropertyStore.update triggers, so they have the same hazard on a legacy-root cluster. The only Pinot reference to HELIX_PROPERTYSTORE anywhere in the codebase is the read-side PredownloadZKClient; no modern Pinot cluster runs against the legacy root.
Given that this API doesn't introduce a failure mode beyond what existing direct-set paths already have on the same hypothetical legacy cluster, I'd rather not expand the surface area with fallback-aware logic for a path with no real callers.
Let me know if you think its still important to add
Summary
Adds a fluent
ZkMultiWriteBuilderAPI for issuing atomic ZooKeepermulti()transactions over Helix property-store paths, exposed viaPinotHelixResourceManager.multiWriteZK(). This lets callers mutate multiple property-store znodes (e.g. severalSegmentZKMetadataentries ) as a single all-or-nothing batch instead of issuing per-path writes that can leave the property store in a half-applied state on a crash or version-mismatch retry.What's in the API
set(path, record[, expectedVersion])— overwrite a znode with optional CAS.create(path, record)— create a persistent znode.delete(path[, expectedVersion])— delete with optional CAS.check(path, expectedVersion)— version assertion that gates the rest of the batch.execute()— submits all accumulated ops as a single ZKmulti().Op paths are property-store-relative (the same paths callers pass to
ZkHelixPropertyStore). The builder prepends/{cluster}/PROPERTYSTOREbefore submitting. Multi-path writes outside the property store are intentionally unsupported.Error model
On atomic rollback,
execute()throws the underlyingKeeperExceptionsubtype so callers can branch cleanly:BadVersionException/NoNodeException/NodeExistsException→ retryable concurrent-state changes.Per-op offender info is reachable via
KeeperException#getResults(). Connectivity / session failures (timeout, interrupt, session expiry) propagate as the originalZkExceptionsince they are not atomic outcomes. The builder is single-use and not thread-safe — obtain a fresh one per transaction.Why a dedicated
ZkClientHelix 1.3.2 does not expose
multi()onBaseDataAccessor, and theZkClientinsideZKHelixManageris not publicly reachable. Reusing it would require reflection, which breaks across Helix point-releases.PinotHelixResourceManagerlazily builds one dedicatedZkClienton firstmultiWriteZK()call (derived from the started Helix manager's ZK address) and closes it asynchronously onstop(). This is consistent with the controller's existing footprint —_propertyStoreand_leadControllerManageralready hold their own ZK sessions.Test plan
ZkMultiWriteBuilderTestcovering ops, fluent chaining, single-use guard, path-prefix validation, andKeeperExceptionunwrapping fromZkException.PinotHelixResourceManagerStatelessTest#testMultiWriteZkSegmentMetadataUpdates— pre-creates twoSegmentZKMetadataznodes, updates both atomically viamultiWriteZK(), and verifies round-trip read through the property store.mvn spotless:apply && mvn checkstyle:check