Skip to content

Refactor BaseTableDataManager #18381

Draft
krishan1390 wants to merge 6 commits intoapache:masterfrom
krishan1390:basetdm_refactor
Draft

Refactor BaseTableDataManager #18381
krishan1390 wants to merge 6 commits intoapache:masterfrom
krishan1390:basetdm_refactor

Conversation

@krishan1390
Copy link
Copy Markdown
Contributor

@krishan1390 krishan1390 commented Apr 30, 2026

Summary

Small refactors to BaseTableDataManager to enable cleaner extensions, abstractions and ownership for future work

  1. Push _segmentReloadSemaphore acquire/release into reloadSegment(SegmentDataManager, ILC, boolean). Removes the duplicated outer acquire in reloadSegment(String) and the parallel reloadSegments(List<SDM>) paths.

  2. Centralize segment delete in BaseTableDataManager. Add TableDataManager#deleteSegment(String) to the SPI and a static deleteSegmentFilesFromDisk helper. HelixInstanceDataManager.deleteSegment now delegates to the TDM (addressing the long-standing TODO) and only falls back to path-only cleanup when the TDM is null. The lookup + delete is held under the per-segment lock so it is atomic vs. concurrent table teardown.

SPI changes

  • New: deleteSegment(String).

Test plan

  • Existing BaseTableDataManager and subclass tests pass.
  • Reload / segment-load / delete paths exercised in integration tests.

krishan1390 and others added 2 commits April 30, 2026 12:23
…ts building blocks

Three small refactors that keep single-segment behavior identical and let a
future multi-segment SegmentDataManager (e.g. a wrapper around N constituent
segments) reuse the same load and reload primitives without forking them:

1. Extract `protected ImmutableSegment loadSegment(zkMetadata, ilc)` from
   `downloadAndLoadSegment`. The new helper performs only the download +
   `ImmutableSegmentLoader.load`, returning the segment without registering it
   in `_segmentDataManagerMap` or invoking upsert hooks. Single-segment callers
   continue to use `downloadAndLoadSegment`, which now composes the helper +
   `addSegment(...)`. This lets a multi-segment manager load all of its members
   first and register a single wrapper entry under one name.

2. Push `_segmentReloadSemaphore` acquire/release down into
   `reloadSegment(SegmentDataManager, IndexLoadingConfig, boolean)`. The public
   `reloadSegment(String)` and the private parallel `reloadSegments(List<SDM>)`
   both used to wrap the inner call with the semaphore; that acquire is now
   inside the per-physical-segment body and the outer wrappers are removed
   (which would otherwise double-acquire on a non-reentrant semaphore). For
   non-group tables this is structurally identical (one segment -> one acquire
   -> one release; same concurrency bound). For multi-segment managers that
   fan out N reloads, each member contends for a slot independently.

3. Drop `@VisibleForTesting` on `isSegmentStale(IndexLoadingConfig,
   SegmentDataManager)` and widen to plain `protected` so subclasses can call
   it from group-aware overrides of `getStaleSegments` / `needReloadSegments`.

The semaphore stays at the orchestration boundary in `doReplaceSegment` (not
relocated into `replaceSegmentIfCrcMismatch`), because subclasses commonly
override `replaceSegmentIfCrcMismatch` and a relocation there would leak the
acquire across paths that bypass the override; subclasses needing per-member
acquire on a multi-segment replace can wrap the call themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.58%. Comparing base (4e40672) to head (953952b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 62.06% 19 Missing and 3 partials ⚠️
...server/starter/helix/HelixInstanceDataManager.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18381       +/-   ##
=============================================
+ Coverage     34.91%   63.58%   +28.66%     
- Complexity      857     1717      +860     
=============================================
  Files          3252     3252               
  Lines        199132   199153       +21     
  Branches      30875    30877        +2     
=============================================
+ Hits          69528   126624    +57096     
+ Misses       123518    62454    -61064     
- Partials       6086    10075     +3989     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.58% <57.14%> (+28.66%) ⬆️
temurin 63.58% <57.14%> (+28.66%) ⬆️
unittests 63.57% <57.14%> (+28.66%) ⬆️
unittests1 55.65% <24.13%> (?)
unittests2 34.91% <39.68%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@krishan1390 krishan1390 marked this pull request as draft May 4, 2026 12:23
@krishan1390 krishan1390 changed the title Refactor BaseTableDataManager so multi-segment managers can compose its building blocks Refactor BaseTableDataManager to enable cleaner extensions, abstractions and ownership for future work May 6, 2026
@krishan1390 krishan1390 changed the title Refactor BaseTableDataManager to enable cleaner extensions, abstractions and ownership for future work Refactor BaseTableDataManager May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants