docs(design): scaling roadmap — multi-region, route scale-out, storage tier, coordinator by bootjp · Pull Request #958 · bootjp/elastickv

bootjp · 2026-06-12T06:46:29Z

Summary

Proposed roadmap doc covering 4 scaling subsystems elastickv will need within one growth cycle. Each subsystem ships its own *_proposed_*.md milestone doc when its work queues up — this file is the shared north star + sequencing constraint.

Subsystems:

§3 Routing (1M routes target): delta-watcher streaming, B-tree RouteEngine, batched splits
§4 Multi-region (2-3 region active-active): WAN raft tuning, per-region HLC ceiling with monotone merge, region-local catalog mirrors
§5 Storage (5-10 TB/shard): SST-ingest snapshot transfer, shared block cache + per-shard tuning, per-shard compactor, S3 snapshot offload
§6 Coordinator (5-10x QPS): per-group HLC ceiling (kill default-group SPOF), follower reads, cross-shard 2PC, partitioned lock resolver, leader-proxy circuit breaker

Cross-cutting: every monotone-merge across new boundaries reuses M2 hotspot-split's SetPhysicalCeiling + Observe primitive. Capability bit rollout follows the existing cap_migration_v2 contract.

Sequencing (§8): §6 M1 (per-group HLC) → §3 M1 (delta watcher) → §5 M1/M2 (SST ingest + shared cache) → §3 M2/M3 → §4 (regions) → §6 M2-M5 → §5 M4. §6 M1 first because it removes the default-group HLC SPOF that every later milestone otherwise inherits.

10 open questions in §10 — the apply-pipeline cost of per-group HLC, partitioned-region migration policy, and cross-shard txn lock ordering need design-level decisions before the first milestone PR opens.

Self-review

Data loss — N/A doc-only; the doc itself preserves M2 hotspot-split's monotone-merge contract for every new boundary
Concurrency / distributed failures — surfaced as OQ (per-group HLC propose cost, partitioned-region policy, cross-shard txn deadlock prevention)
Performance — SLO targets quantified per subsystem (§3.1, §4.1, §5.1, §6.1)
Data consistency — explicitly: every new HLC boundary uses the same SetPhysicalCeiling primitive; capability bits gate every wire change
Test coverage — §9 table maps each subsystem to unit/property tests + Jepsen workloads (existing route-shuffle extended to 100k routes; new multi-region partition workload; existing workloads with 10x data)

Test plan

doc reads coherent end-to-end
sequencing graph in §8 has no cycles
cross-refs to existing M2 hotspot-split design verified
reviewer feedback on §10 open questions (these are the design-level decisions blocking first milestone PR)

Summary by CodeRabbit

Documentation
- Added a comprehensive scaling roadmap detailing phased plans for multi-region active‑active, routing scale‑out, storage tier capacity/retention, coordinator/API gateway scaling, rollout sequencing, observability guidance, testing strategies, and open questions.
Tests
- Improved learner-promotion test to use a robust retry/wait approach and explicit success criteria to better handle transient catch‑up conditions during node promotion.

…e tier, coordinator elastickv today is sized for single DC, low-thousands of shards, single-digit-TB per shard, and leader-bound serve paths. Each of those four limits is reachable from the existing production envelope within one growth cycle — and the four are coupled. This roadmap captures the SLO targets, current-state breakage points, per-subsystem milestone designs (high-level), cross-cutting concerns (HLC ceiling composability, capability bits, observability), and the sequencing constraint between them. Each milestone below ships its own *_proposed_*.md doc when its work is queued — this file is the shared north star. Subsystems covered: 1. Routing / shard scale-out (target 1M routes) - M1 delta-watcher + streaming WatchCatalog (replace 100ms poll + full snapshot rebuild) - M2 B-tree RouteEngine + per-group index + COW history overlay - M3 BatchSplitRange for high-rate hotspot-split automation 2. Multi-region / cross-DC (target 2-3 region active-active) - M1 WAN raft tuning + region-aware membership - M2 per-region HLC ceiling with monotone merge on migration (reuses M2 hotspot-split SetPhysicalCeiling + Observe primitive) - M3 per-region catalog mirror via raft learners - M4 cross-region disaster recovery 3. Storage tier (target 5-10 TB per shard) - M1 SST-ingest snapshot transfer (kill the full-iter stream) - M2 shared block cache + per-shard pebble tuning + L0 backpressure → ErrShardWriteBackpressure - M3 per-shard compactor jittered + 100k version cap + hot-key priority - M4 S3 snapshot offload for DR 4. Coordinator / API gateway (target 5-10x QPS) - M1 per-group HLC ceiling (kill default-group single point of failure); cap_per_group_hlc_v1 - M2 follower-read path with max_staleness_ms contract - M3 cross-shard 2PC (lifts ErrCrossShardTransactionNotSupported) - M4 partitioned lock resolver (followers do the scan, leader applies) - M5 leader-proxy circuit breaker (kill 200-retry storm) The sequencing (§8) puts §6 M1 first because it removes the default-group HLC single point of failure that every later milestone otherwise inherits. §3 M1 second because route-count growth feeds both §4 (regions) and §6 M3 (cross-shard txn rebalancing freedom). Cross-cutting invariant: every monotone-merge across the new boundaries (per-shard, per-region) reuses the SetPhysicalCeiling + Observe primitive that M2 hotspot-split (PR #945) specifies. The capability bit rollout pattern follows the same M2 cap_migration_v2 contract. 10 open questions in §10 are reviewer-targets — particularly the per-group HLC apply-pipeline cost (§10.1), the partitioned-region migration policy (§10.2), and the cross-shard txn lock ordering (§10.4) need design-level decisions before the first milestone PR opens.

coderabbitai · 2026-06-12T06:46:38Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 53 minutes and 58 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3096108-f694-41b4-ba01-ce7d8c86e698

📥 Commits

Reviewing files that changed from the base of the PR and between 2a8c266 and b44a61d.

📒 Files selected for processing (1)

docs/design/2026_06_12_proposed_scaling_roadmap.md

📝 Walkthrough

Walkthrough

This PR adds a comprehensive multi-subsystem scaling roadmap design document for elastickv spanning routing scale-out, multi-region active-active, storage tier, and coordinator improvements sequenced across milestones, and makes the learner promotion test more robust by retrying promotion on transient catch-up errors.

Changes

Multi-Subsystem Scaling Roadmap Design

Layer / File(s)	Summary
Roadmap header, motivation, and overview `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Adds document metadata and introduces motivation, current scaling envelope, subsystem coupling, and sequencing needs (lines 1–178).
Routing scale-out milestones and SLOs `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Defines routing scale-out target SLOs and M1–M3 milestones: versioned diffs, indexed route engine, and batched catalog split/move proposals (lines 179–256).
Multi-region milestones and SLOs `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Defines multi-region target SLOs and M1–M4 milestones: WAN-safe Raft, per-region HLC with monotone-merge, catalog mirroring, and DR concepts (lines 257–343).
Storage tier milestones and SLOs `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Defines storage tier target SLOs and M1–M4 milestones: SST-level snapshot ingest, shared cache, sharded MVCC retention, and S3 snapshot offload (lines 344–412).
Coordinator/API gateway milestones and SLOs `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Defines coordinator target SLOs and M1–M5 milestones: per-group HLC renewal, follower reads, cross-shard 2PC, partitioned lock resolution, and bounded leader-proxy circuit-breaker (lines 413–510).
Cross-cutting invariants and observability `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Adds HLC/monotone-merge composability invariant, capability-bit rollout pattern, and observability metric constraints (lines 511–565).
Sequencing and rollout phases `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Provides recommended phased rollout order and dependency-driven flexibility for milestone ordering (lines 566–609).
Testing strategy and open questions `docs/design/2026_06_12_proposed_scaling_roadmap.md`	Maps milestones to unit/property tests and Jepsen workloads, lists detailed open questions, and clarifies scope boundaries (lines 610–697).

Learner Promotion Test Robustness

Layer / File(s)	Summary
Promotion retry with warm commit index `internal/raftengine/etcd/learner_test.go` (lines 98–120)	Updates `TestPromoteLearnerSwapsRoleToVoter` to capture the leader’s warm proposal `CommitIndex` as the catch-up bar and retry `PromoteLearner` using `require.Eventually`, treating `errPromoteLearnerNotCaughtUp` as a transient wait condition.

Possibly related PRs

bootjp/elastickv#676: Introduced the learner and PromoteLearner test suite that this PR's test improvement builds upon.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A roadmap stitched in thoughtful lines,
Regions, routes, and storage that realigns,
When learners lag and need a shove,
A patient retry fits like a glove,
Hops forward — steady, strong, and kind.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change — a design document introducing a scaling roadmap across four subsystems (multi-region, route scale-out, storage tier, coordinator).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch design/scaling-roadmap-2026-06

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dda7d0b36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T06:48:16Z

+- Followers serve reads at a bounded staleness, with the staleness
+  guaranteed by the existing lease infrastructure. A follower-read


Don't base follower reads on leader-only leases

When implementing §6 M2, followers cannot use the existing lease path: Coordinate.LeaseRead in kv/coordinator.go only returns the fast path when the engine state is StateLeader, and its secondary lease check also requires leader state. In this scenario the proposed follower path either remains leader-bound and misses the 5x read-QPS target, or bypasses the existing checks and serves reads without follower-visible quorum/closed-timestamp evidence. Please make this milestone depend on a follower-safe timestamp/read-index mechanism instead of the current leader-local lease infrastructure.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request proposes a comprehensive scaling roadmap for elastickv, detailing milestones across routing scale-out, multi-region active-active support, storage tier optimizations, and coordinator throughput improvements. The reviewer provided valuable feedback on several architectural details: warning that spawning a goroutine per Raft group for HLC lease renewal will cause severe CPU thrashing at scale; highlighting a potential state machine stall if transaction status lookups are performed synchronously during lock-resolver delegation; pointing out a conflict between the proposed 100k version cap and the 30-minute retention window for high-throughput keys; and recommending atomic copy-on-write catalog updates to prevent readers from observing transiently inconsistent route states during delta application.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T06:48:29Z

+- `ShardedCoordinator.RunHLCLeaseRenewal` becomes a per-group loop
+  (one goroutine per group, each ticking 1 s independently).


Running a dedicated goroutine per Raft group for HLC lease renewal (ShardedCoordinator.RunHLCLeaseRenewal) will not scale well when the cluster grows to the targeted 100k–1M routes/groups. Spawning and waking up hundreds of thousands of goroutines every second will cause severe CPU thrashing and scheduler overhead.

Instead of a per-group loop with its own goroutine, consider managing the renewals via a centralized timer wheel or a single background worker goroutine that iterates over or batches the active groups led by the node.

Suggested change

- `ShardedCoordinator.RunHLCLeaseRenewal` becomes a per-group loop

(one goroutine per group, each ticking 1 s independently).

- ShardedCoordinator.RunHLCLeaseRenewal managed via a centralized timer wheel or a single background worker (rather than a dedicated goroutine per group) to avoid scheduler overhead at scale.

gemini-code-assist · 2026-06-12T06:48:29Z

+- Net: the leader is no longer the single scan point — followers
+  do the scan, leaders just apply the resolved batches.


When delegating the lock-resolver scan to a follower, the design needs to clarify where the primary transaction status lookup (which requires querying the primary shard, potentially on a remote node/region) is performed:

If the follower performs the lookup before proposing the batch, the follower must execute remote RPCs.

If the leader performs the lookup after receiving the proposed batch of expired locks, the leader must execute those RPCs.

In either case, performing these lookups synchronously within the Raft apply loop is a major hazard that can stall the state machine. The design should explicitly state that transaction status lookups must be performed asynchronously outside the Raft execution path, with only the final, fully-resolved commit/abort decisions proposed/applied to the log.

Suggested change

- Net: the leader is no longer the single scan point — followers

do the scan, leaders just apply the resolved batches.

- Net: the leader is no longer the single scan point — followers do the scan and asynchronously resolve transaction statuses against primary shards before proposing the final resolved batches.

gemini-code-assist · 2026-06-12T06:48:29Z

+- Per-key version cap: from 1 M (`maxSnapshotVersionCount`)
+  down to 100 k AND a compactor cycle that prioritises hot keys
+  (closes the 555 writes/s 30-min hot-key cap regression).


Lowering the per-key version cap from 1M down to 100k (line 378) while maintaining a 30-minute retention window (line 374) introduces a conflict for high-throughput keys.

At 100k versions, any key written at more than 100,000 / 1800 ≈ 55 writes/second will exceed the cap within the 30-minute window. If the compactor enforces the 100k cap by discarding older versions, it will violate the 30-minute retention guarantee for those hot keys. The design should clarify how the system resolves the conflict between the hard version cap and the time-based retention window for high-write-rate keys.

Suggested change

- Per-key version cap: from 1 M (`maxSnapshotVersionCount`)

down to 100 k AND a compactor cycle that prioritises hot keys

(closes the 555 writes/s 30-min hot-key cap regression).

- Per-key version cap: dynamically scale or bound the cap based on the write rate to ensure the 30-min retention window is not violated for hot keys, while prioritizing hot-key compaction.

gemini-code-assist · 2026-06-12T06:48:29Z

+- Watcher pulls deltas, applies under fine-grained per-route lock
+  (per-route version vector instead of a global slice rebuild).


Applying deltas under fine-grained per-route locks (lines 201-202) can lead to transiently inconsistent catalog states where some route updates are visible to readers while others are not.

Since the Composed-1 cross-version-read gate and the history ring (M2) rely on consistent point-in-time RouteHistorySnapshot views, any non-atomic application of deltas could cause routing anomalies or spurious validation failures. The delta application should be batched and committed atomically to the RouteEngine (e.g., via a copy-on-write version transition) so that readers always observe a fully consistent catalog version.

Suggested change

- Watcher pulls deltas, applies under fine-grained per-route lock

(per-route version vector instead of a global slice rebuild).

- Watcher pulls deltas, applies them to a copy-on-write version overlay, and publishes the new catalog version atomically to prevent readers from observing transiently inconsistent route states.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/design/2026_06_12_proposed_scaling_roadmap.md (1)
612-613: 🏗️ Heavy lift

Add explicit HLC / mirror-consistency coverage to the multi-region test row.

The current plan only names WAN-tuning unit tests for internal/raftengine/etcd, but the real risk in §4.2 is the per-region HLC ceiling + monotone-merge + region-local catalog mirror contract. Please add direct kv/hlc.go / kv/coordinator.go property tests and a partition/stale-mirror case so the correctness invariant is exercised before rollout.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/2026_06_12_proposed_scaling_roadmap.md` around lines 612 - 613,
Add explicit property tests and a partition/stale-mirror workload to exercise
the per-region HLC ceiling, monotone-merge, and region-local catalog mirror
contract: add unit/property tests in the kv/hlc and kv/coordinator modules that
assert HLC ceiling behavior and monotone-merge invariants under concurrent
writes, and implement a new multi-region jepsen-style workload (e.g.,
"stale-mirror-during-write" or "partition-stale-mirror") that injects network
partitions and simulated stale region-local catalog mirrors during writes to
validate mirror-consistency; update the §4 multi-region test row in the roadmap
to list these kv/hlc and kv/coordinator property tests and the new
partition/stale-mirror workload.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_12_proposed_scaling_roadmap.md`:
- Around line 565-591: The fenced block containing the phase roadmap (the block
starting with "Phase 1 (data-plane scale, no new region):" and listing §6 M1, §3
M1, etc.) lacks a language tag which triggers markdownlint MD040; fix by adding
a language identifier (e.g., change the opening ``` to ```text) so the fence is
explicitly marked as plain text and the lint warning is resolved.

---

Nitpick comments:
In `@docs/design/2026_06_12_proposed_scaling_roadmap.md`:
- Around line 612-613: Add explicit property tests and a partition/stale-mirror
workload to exercise the per-region HLC ceiling, monotone-merge, and
region-local catalog mirror contract: add unit/property tests in the kv/hlc and
kv/coordinator modules that assert HLC ceiling behavior and monotone-merge
invariants under concurrent writes, and implement a new multi-region
jepsen-style workload (e.g., "stale-mirror-during-write" or
"partition-stale-mirror") that injects network partitions and simulated stale
region-local catalog mirrors during writes to validate mirror-consistency;
update the §4 multi-region test row in the roadmap to list these kv/hlc and
kv/coordinator property tests and the new partition/stale-mirror workload.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec122dfd-8b2e-4c6a-b78b-06c9a0e7170f

📥 Commits

Reviewing files that changed from the base of the PR and between 688ba8a and 2e3c4f3.

📒 Files selected for processing (2)

docs/design/2026_06_12_proposed_scaling_roadmap.md
internal/raftengine/etcd/learner_test.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df26571662

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-13T13:43:19Z

+- Catalog rows already carry a `catalog_version` per
+  `RouteDescriptor` and version bumps; expose
+  `GetCatalogChanges(from_version, to_version) → []RouteDelta`
+  with `RouteDelta = {op: ADD|MODIFY|REMOVE, RouteDescriptor}`.


Define delta storage before relying on per-route versions

This milestone depends on a per-route catalog_version, but the current catalog schema does not store one: distribution.RouteDescriptor only carries route id/range/group/state/parent, and CatalogStore.Save writes a single global CatalogVersionKey plus the changed route rows. With the existing storage, GetCatalogChanges(from_version, to_version) cannot discover ADD/MODIFY/REMOVE deltas by scanning route versions; it would have to reconstruct and diff full snapshots, reintroducing the 1M-route full-catalog cost this milestone is supposed to remove. Please add an explicit change log or per-route version schema to the roadmap before making delta fan-out the dependency.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-13T13:43:19Z

+  `served_at_ts < read_ts - max_staleness_ms`.
+- Per-shard "follower ready" gate: follower must have applied at
+  least up to `max_applied_ts ≥ read_ts - max_staleness_ms`.


Convert HLC timestamps before applying staleness budgets

If this formula is implemented literally, max_staleness_ms is subtracted from an encoded HLC timestamp, not from milliseconds. kv/hlc.go stores timestamps as physical_ms << HLCLogicalBits | logical, so a 1000 ms budget subtracts only 1000 logical ticks (usually less than one physical millisecond) and healthy follower reads would be rejected, missing the planned read-QPS gain. The contract should either compare physical milliseconds or define served_at_ts/read_ts as decoded millisecond values here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a8c2665ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-13T15:34:26Z

+  migration path) carry the source-region ceiling on the
+  `ExportRangeVersionsRequest` and the target region's
+  `SetPhysicalCeiling` applies max(local, imported) — the same
+  monotone-merge primitive M2 already specifies. This bounds the


Carry the source HLC value during region cutover

If this cutover contract is implemented literally, the target only imports the source physical ceiling. That is not enough to preserve HLC monotonicity: SetPhysicalCeiling advances only the physical ceiling, while Observe is what advances the last issued packed HLC value. When the source has already issued (ceiling<<16)|n in the same physical millisecond, the target can start at (ceiling<<16)|0, moving timestamps backwards for the migrated route and weakening MVCC/OCC ordering. Please make the migration payload carry and observe the source max HLC, or advance the physical ceiling past it.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-13T15:34:27Z

+- Periodic per-shard Pebble snapshot uploaded to an S3-compatible
+  bucket (the S3 adapter already speaks the protocol).
+- Restore is `s3 fetch → pebble.Ingest`; combined with M1's


Add the missing outbound S3 offload dependency

The current S3 adapter is an inbound S3-compatible server backed by the local KV coordinator, not an outbound client for writing to an external bucket. If M4 depends on reusing it as the offload path, DR snapshots would either be stored back inside the same cluster or have no upload path at all. This milestone should call out the required external S3 client/credentials/KMS integration instead of treating the existing adapter as sufficient.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

docs: address scaling roadmap review feedback

2e3c4f3

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread docs/design/2026_06_12_proposed_scaling_roadmap.md Outdated

docs: tag scaling roadmap sequence fence

df26571

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

bootjp added 2 commits June 14, 2026 00:27

docs: address Codex roadmap review

2a8c266

docs: address multi-region test review

b44a61d

chatgpt-codex-connector Bot reviewed Jun 13, 2026

View reviewed changes

		- Followers serve reads at a bounded staleness, with the staleness
		guaranteed by the existing lease infrastructure. A follower-read

		- `ShardedCoordinator.RunHLCLeaseRenewal` becomes a per-group loop
		(one goroutine per group, each ticking 1 s independently).

	- `ShardedCoordinator.RunHLCLeaseRenewal` becomes a per-group loop
	(one goroutine per group, each ticking 1 s independently).
	- ShardedCoordinator.RunHLCLeaseRenewal managed via a centralized timer wheel or a single background worker (rather than a dedicated goroutine per group) to avoid scheduler overhead at scale.

		- Net: the leader is no longer the single scan point — followers
		do the scan, leaders just apply the resolved batches.

		- Watcher pulls deltas, applies under fine-grained per-route lock
		(per-route version vector instead of a global slice rebuild).

	- Watcher pulls deltas, applies under fine-grained per-route lock
	(per-route version vector instead of a global slice rebuild).
	- Watcher pulls deltas, applies them to a copy-on-write version overlay, and publishes the new catalog version atomically to prevent readers from observing transiently inconsistent route states.

Conversation

bootjp commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Self-review

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Possibly related PRs

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading