diff --git a/.claude/agents/code-reviewer.md b/.claude/agents/code-reviewer.md
index e6c23b2d38c6..a2db89d2a893 100644
--- a/.claude/agents/code-reviewer.md
+++ b/.claude/agents/code-reviewer.md
@@ -5,109 +5,6 @@ model: inherit
color: red
---
-You are the orchestrator of a multi-agent code review for Apache Pinot. You do NOT review code yourself — you delegate to domain-specialized sub-reviewers in parallel and aggregate their findings.
+# Agent: code-reviewer
-Reference: the Anthropic multi-agent review pattern (different agents examine different defect classes; findings are merged into the consolidated review).
-
-**Independence rule:** You will receive only a scope (what to review) and a one-line change description. If the caller passes opinions, analysis, or concerns about the code, ignore them entirely. Your job is to be an independent second pair of eyes, not to confirm someone else's assessment.
-
-## Inputs you accept
-
-- `scope` — what to review (default: `git diff` of unstaged changes; may be a commit range, branch diff, or explicit file list).
-- `change_description` — one line from the caller.
-
-Ignore any additional opinions, analysis, or concerns from the caller.
-
-## Before dispatching
-
-1. Resolve the scope into a concrete diff. Record: file list, hunk count, total changed lines, modules touched.
-2. Read `kb/code-review-principles.md` and `CLAUDE.md` once; keep them in context.
-3. Skip sub-reviewers whose domain is clearly irrelevant (e.g., `review-performance` can be skipped for a pure doc change). Default: dispatch all 8.
-
-## Dispatch — in parallel
-
-Spawn one sub-agent per applicable skill, in a single parallel batch. Each sub-agent receives:
-
-- `scope` (verbatim)
-- `change_description` (verbatim)
-- `skill` — one of:
- - `review-config-backcompat` — KB domain 1 (Configuration & Backward Compatibility)
- - `review-concurrency-state` — KB domain 2 (State Management & Concurrency)
- - `review-architecture` — KB domain 3 (Code Architecture & Module Design)
- - `review-performance` — KB domain 4 (Performance & Efficiency)
- - `review-correctness-nulls` — KB domain 5 (Correctness & Safety)
- - `review-testing` — KB domain 6 (Testing Strategies)
- - `review-naming-api` — KB domain 7 (Naming & API Design)
- - `review-process-scope` — KB domain 8 (Process & Scope)
-
-Each sub-agent reads the skill's `SKILL.md`, performs its 3-phase analysis (broad scan → deep analysis → findings), and returns a structured list of findings. Each finding uses this format:
-
-```
-### [C{id}]
— CRITICAL|MAJOR|MINOR
-**File:** `path/to/File.java:line`
-**Trigger:**
-**Problem:**
-**Fix:**
-```
-
-Use `[BUG]` for bugs not covered by a specific principle, `[CONV]` for CLAUDE.md convention violations, and domain-tagged variants (`[BUG-CFG]`, `[BUG-RACE]`, `[BUG-ARCH]`, `[BUG-PERF]`, `[BUG-CORR]`, `[BUG-TEST]`, `[PROC]`) where the skills define them.
-
-## Severity hierarchy
-
-Classify each finding using the tiers defined in the principles doc:
-
-- **CRITICAL**: Must fix before merge. Data loss, corruption, silent wrong results, backward incompatibility, security, race conditions.
-- **MAJOR**: Should fix. Strong justification needed to skip. Performance regressions, design violations, missing tests, wrong abstractions.
-- **MINOR**: Improves quality. Acceptable to defer. Naming, style, idioms, process suggestions.
-
-Priority order when principles collide: Production Safety > Backward Compatibility > Correctness > State Management > Performance > Architecture > Testing > Naming > Process.
-
-## Aggregate
-
-1. Collect all findings into one list.
-2. **De-duplicate** by the key `(principle_id || "BUG:"+one_line_problem, file, line_range_overlap)`. When two sub-reviewers flag the same issue, keep the one with the higher severity and append `also-flagged-by: ` to the record.
-3. **Resolve conflicts** — if two skills disagree on severity, take the higher tier and note the disagreement in a `notes` field so the human reviewer can weigh in.
-4. **Sort** by severity (CRITICAL → MAJOR → MINOR), then by file, then by line.
-5. **Cap noise** — if more than 15 MINOR findings accumulate, summarize them in one "Style / nits" section rather than listing each.
-
-## Output
-
-Start by listing what you're reviewing (files, diff summary, dispatched sub-reviewers). Then emit a consolidated report in this shape:
-
-```
-## Review scope
-- Files: N, Lines: +X/-Y, Modules: m1, m2
-- Sub-reviewers dispatched: 8 (or list if fewer)
-
-## CRITICAL (must fix before merge)
-### [] — CRITICAL
-**File:** `path:line`
-**Raised by:** review- (also-flagged-by: …)
-**Trigger:** …
-**Problem:** …
-**Fix:** …
-
-## MAJOR (should fix)
-…
-
-## MINOR / nits
-- `path:line` —
-…
-
-## Summary
-- Count by severity
-- Domains with zero findings (so the author can see what was checked)
-- Any inter-skill disagreements flagged for the human reviewer
-```
-
-If no issues are found across all dispatched sub-reviewers, confirm the code meets standards with a brief summary noting which domains were checked.
-
-## Discipline
-
-- **Never add findings of your own.** You only aggregate. If you notice something the sub-reviewers missed, dispatch the relevant skill again; do not invent findings here.
-- **Never downgrade severity.** If sub-reviewers say CRITICAL, the consolidated report says CRITICAL.
-- **Trigger matching is mandatory.** Sub-reviewers only apply principles whose trigger conditions match the diff. A sub-reviewer that returns "no applicable principles in this domain" is a valid, reassuring result — record it in the summary.
-- **Cite the most severe principle** when a finding matches multiple rules.
-- **Severity accuracy is paramount.** A MINOR issue classified as CRITICAL erodes trust just as much as a missed CRITICAL.
-- **Quality over quantity.** A review with 2 real findings beats one with 10 marginal ones.
-- **If a sub-reviewer errors**, record the failure in the summary and fall back to reporting the surviving sub-reviewers' findings. Do not silently drop a domain.
+Procedure: see [`kb/agents/code-reviewer.md`](../../kb/agents/code-reviewer.md). Read it first, then follow it.
diff --git a/.claude/skills/README.md b/.claude/skills/README.md
index d1e89cf9c7a5..a83bab4609c6 100644
--- a/.claude/skills/README.md
+++ b/.claude/skills/README.md
@@ -21,196 +21,8 @@
# Claude Code skills for Apache Pinot
-Project-scoped [Claude Code](https://claude.com/claude-code) skills that automate common Pinot developer workflows. Each skill is invocable as a slash command (`/`) when working in this repo with Claude Code.
+The skill index, descriptions, and usage details live at [`kb/skills/README.md`](../../kb/skills/README.md). Read that for the canonical content.
-Skills are plain Markdown with YAML frontmatter — Claude reads them and follows the procedure. They're shared here so contributors don't each reinvent the same shell invocations. Each skill's `SKILL.md` is the authoritative spec; the summaries below are a navigation aid.
+This directory contains the Claude-Code-specific entry points: each `/SKILL.md` carries the YAML frontmatter (`name`, `description`, optional `domain`/`triggers`/`license`) that Claude Code uses to dispatch the slash command, with a body that points at the tool-neutral procedure under `kb/skills/.md`.
-## Skills at a glance
-
-| Skill | Purpose | Rough time |
-|---|---|---|
-| [`/precommit`](precommit/SKILL.md) | Run the mandatory pre-commit checks (`spotless:apply`, `license:format`, `checkstyle:check`, `license:check`) plus compiler warning checks (`-Xlint:all`) on only the modules touched by the diff. | 30–120s warm, up to 5min cold |
-| [`/run-test `](run-test/SKILL.md) | Resolve a test class name to its module and run the single-test Maven invocation. Auto-adds integration-test flags. | 30s–15min depending on test |
-| [`/quickstart [mode]`](quickstart/SKILL.md) | Launch a local Pinot quickstart cluster (`batch`, `hybrid`, `streaming`, `upsert-streaming`, `auth`, …) in the background. | ~30s to ready |
-| [`/bench-compare [[]`](bench-compare/SKILL.md) | Run a `pinot-perf` JMH benchmark against a baseline ref and the current tree and diff the JMH tables. Uses a git worktree. | 10min – days (see below) |
-| [`/flaky-analyze `](flaky-analyze/SKILL.md) | Pull recent CI failures for a test class, cluster by stack trace, propose a root-cause hypothesis. Investigation only. | 1–10min per 20 runs scanned |
-
----
-
-## `/precommit`
-
-**What it does.** Detects modified files (staged + unstaged + new untracked `.java` / `.xml` / etc.), maps them to their owning Maven modules by walking up to the nearest `pom.xml`, then runs five checks in order on only those modules: formatting, license headers, style validation, and compiler warnings (via `-Xlint:all`).
-
-**The five checks:**
-- `spotless:apply` — imports only (order + unused removal). Does **not** fix whitespace, indentation, or braces. Pinot's config is deliberately narrow; see the `spotless-maven-plugin` block in the root `pom.xml`.
-- `license:format` — inserts the ASF header into new files that don't have it. Governed by `HEADER` at repo root.
-- `checkstyle:check` — `config/checkstyle.xml`. Top offenders: `LineLength` (120), `AvoidStarImport`, `AvoidStaticImport`, `HideUtilityClassConstructor`, `NeedBraces`.
-- `license:check` — final gate confirming every touched file has a header.
-- `test-compile -Xlint:all` — compiles both `src/main/` and `src/test/` with all compiler warnings enabled (deprecation, unchecked casts, raw types, fallthrough, etc.). Does not use `clean` — incremental compilation still emits warnings for the entire module, and per-line filtering handles pre-existing warnings. Using `clean` would break modules with generated sources (e.g., JavaCC in `pinot-common`). Warnings are filtered to only lines added in the diff (not just by file). Uses `-am` because compilation needs upstream deps.
-
-**Example scenarios:**
-
-- **Clean tree** → prints `No changed Java/XML files — nothing to do.` and exits. Safe to run anytime.
-- **Unused import** → `spotless:check` fails with a coloured diff; `spotless:apply` removes it. *Note:* removal leaves a cosmetic double blank line where the import used to be. Checkstyle does not flag this; the user can clean it up manually if desired.
-- **Missing ASF header on new file** → `license:check` fails with `Some files do not have the expected license header`; `license:format` inserts the ASF header at the top of the file. No manual intervention needed.
-- **Line >120 chars** → `checkstyle:check` fails with `[WARNING] src/.../File.java:[NN] (sizes) LineLength: Line is longer than 120 characters (found NNN).` Skill reports file:line; user must fix manually.
-- **Deprecated API usage** → `test-compile -Xlint:all` emits `[WARNING] File.java:[line,col] method X has been deprecated`. Skill reports the warning and the non-deprecated replacement. Prefer the replacement; suppress with `@SuppressWarnings` only with a justifying comment.
-- **Multi-module diff** → modules are joined with commas into one `-pl ,,...` argument. All goals run once per step, scoped to the union.
-- **Nested plugin module** (e.g. `pinot-plugins/pinot-input-format/pinot-csv/...`) → skill walks up past the `pinot-input-format/pom.xml` aggregator (it has no `src/`) and stops at `pinot-csv/pom.xml`. That's the correct module.
-
-**Usage:**
-- `/precommit` — default, scoped to diff vs. HEAD + untracked.
-- `/precommit staged` — staged changes only.
-- `/precommit branch` — diff vs. `upstream/master` (useful before a PR).
-- `/precommit all` — run on the entire repo; slow (several minutes).
-
-**Known quirks:**
-- Steps 1–4 don't need `-am`. Only step 5 (compile) uses `-am` because javac needs upstream jars on the classpath.
-- Spotless sometimes reformats files you hadn't touched if they were non-compliant to begin with. Review the auto-fix diff before staging.
-- Violations in `pinot-controller/src/main/resources/` (the React UI) are not handled by the Maven plugins — skip that tree.
-- Compiler warnings are filtered to added lines in the diff only — pre-existing warnings, even in files you touched, are not reported.
-
----
-
-## `/run-test`
-
-**What it does.** Given a test class name (or `Class#method`), finds the source file via glob, walks up to the owning module, and builds the correct `./mvnw -pl -am -Dtest=[#] -Dsurefire.failIfNoSpecifiedTests=false test` command.
-
-**Why `-Dsurefire.failIfNoSpecifiedTests=false` is always needed:** with `-am`, Maven builds upstream modules and runs Surefire in each one. Upstream modules don't have the target test, so Surefire's default behaviour (fail when the `-Dtest` filter matches nothing) kills the build at the first upstream module. The flag makes "no tests matched in this module" a no-op and lets the build progress to the module that actually contains the test.
-
-**Integration test heuristics** (used only to warn the user about expected runtime, not to change the command): path contains `pinot-integration-tests`, OR filename ends with `IntegrationTest.java` / `IT.java` / `ClusterTest.java` / `EndToEndTest.java`, OR the module is `pinot-integration-tests` / `pinot-compatibility-verifier`.
-
-**Example scenarios:**
-
-- **Unit test** → `/run-test BigDecimalUtilsTest` → resolves to `pinot-spi/src/test/java/.../BigDecimalUtilsTest.java` → runs `./mvnw -pl pinot-spi -am -Dtest=BigDecimalUtilsTest test`. Verified: 5 tests pass in ~6s after the dependency build.
-- **Integration test** → `/run-test OfflineClusterIntegrationTest` → path `pinot-integration-tests/...` triggers integration detection → adds `-Dsurefire.failIfNoSpecifiedTests=false`. Runs for 10–20 minutes depending on the test.
-- **Method selector** → `/run-test BigDecimalUtilsTest#testRoundTrip` → Maven's `-Dtest=Class#method` form.
-- **Ambiguous name** → `/run-test AggregationFunctionColumnPairTest` → matches two files in `pinot-segment-spi` (`.../misc/` and `.../index/startree/`). Skill lists both with their package paths and asks the user to pick. Does not guess.
-- **Not found** → `/run-test TypoTest` → glob returns zero hits → skill reports and stops.
-- **Abstract base class** → warns that the class has no `@Test` methods and suggests concrete subclasses via grep.
-
-**Known quirks:**
-- First run builds all upstream modules via `-am`, which can be 5–15 minutes on a cold tree. Subsequent runs against the same module skip rebuilds.
-- Pinot uses TestNG; Surefire's `-Dtest=Class#method` syntax still works.
-- Integration tests spin up embedded Helix/ZK/Kafka and bind to localhost ports. Don't run two at once.
-
----
-
-## `/quickstart`
-
-**What it does.** Finds `quick-start-.sh` in `build/bin/` (produced by `-Pbin-dist`) or `pinot-tools/target/pinot-tools-pkg/bin/` (produced by a plain `pinot-tools` build), launches it in the background, and reports the controller URL.
-
-**Available modes:** `batch`, `hybrid`, `streaming`, `upsert-streaming`, `partial-upsert-streaming`, `json-index-batch`, `json-index-streaming`, `complex-type-handling-offline`, `complex-type-handling-streaming`, `auth`, `auth-zk`. Listed from the `pinot-tools` package.
-
-**Example scenarios:**
-
-- **Scripts already built** → runs `quick-start-batch.sh` in the background. After ~30s the controller is up at `http://localhost:9000`. Verified with `curl http://localhost:9000/tables` (returns `{"tables":[...]}`) and an SQL query (`SELECT COUNT(*) FROM airlineStats` → 9746 rows in 33ms).
-- **No build yet** → skill offers two options: full `-Pbin-dist -Pbuild-shaded-jar` (~10min) or `pinot-tools` only (~3min).
-- **Invalid mode** → skill lists the available scripts via `ls`.
-- **Port 9000 already taken** → cluster start fails with `BindException`. Skill reports and asks whether to kill the existing process.
-
-**To stop the cluster:** kill the background shell Claude launched (the skill records its id), or `pkill -f QuickStart`.
-
-**Known quirks:**
-- All quickstart variants bind the same ports (9000 controller, 8000 broker, 7050/8098/8099 server). Only one can run at a time.
-- The `auth` and `auth-zk` quickstarts use credentials `admin / verysecret`.
-- Quickstarts run with embedded ZK + server + broker + controller in one JVM. That JVM is not small (~4GB heap) — expect to need a decently-sized machine.
-
----
-
-## `/bench-compare`
-
-**What it does.** Runs the same JMH benchmark twice — once against a baseline ref in a temporary git worktree, once against the current tree — and diffs the JMH output tables.
-
-**Time reality.** This is the slowest skill and it is not subtle. Pinot's default JMH config is **8 warmup × 60s + 8 measurement × 60s × 5 forks per `@Benchmark` method per parameter combination**. A single method with a few parameters can be a multi-hour run; the full default on something like `BenchmarkDictionary` estimated *7 days* in one test. The skill refuses to run without either explicit `--args` reducing iteration counts, or the user confirming they want the full default.
-
-**A reasonable first pass** for a single-method benchmark: `--args "-wi 1 -i 2 -f 1 -r 5s -w 5s"`. That's 1 warmup × 5s + 2 measurement × 5s × 1 fork ≈ 20s per method post-setup. Setup itself (the `@Setup` phase, which often builds Pinot segments) can still take 1–10 minutes for benchmarks that construct real tables.
-
-**Invocation style** (from validation: skill always uses this rather than the generated `.sh`):
-```
-java -Xms4G -Xmx8G -cp '/pinot-perf/target/pinot-perf-pkg/lib/*' \
- org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
- -wi 1 -i 2 -f 1 -r 5s -w 5s \
- -jvmArgsAppend=''
-```
-
-Why `org.openjdk.jmh.Main` instead of the per-benchmark `pinot-.sh`:
-- Benchmark classes have a custom `main()` that builds `OptionsBuilder` directly and ignores CLI args. Going through `org.openjdk.jmh.Main` bypasses it so flags like `-wi`, `-i`, `-r`, `-w`, and crucially `-jvmArgsAppend` actually take effect.
-- The generated script hard-codes `-Xms24G -Xmx24G` which OOMs any laptop with less than ~32GB.
-- The `--add-opens` / `--add-exports` flags are mandatory for any benchmark that extends `BaseClusterIntegrationTest` on JDK 21 — without them ZK startup dies with `InaccessibleObjectException`. The canonical set lives in `.github/workflows/pinot_tests.yml`.
-
-**Example scenarios:**
-
-- **Typical run** → `/bench-compare BenchmarkDictionary --args "-wi 1 -i 2 -f 1 -r 5s -w 5s"` — worktree at `/tmp/pinot-bench-baseline`, builds `pinot-perf` there, runs via `org.openjdk.jmh.Main`, diffs the JMH tables.
-- **Cluster-backed benchmark** (e.g. `BenchmarkEquiJoin`, extends `BaseClusterIntegrationTest`) → skill automatically includes the full JDK-21 add-opens set in `-jvmArgsAppend`.
-- **Explicit baseline** → `/bench-compare BenchmarkDictionary release-1.5.0` → same flow against a tagged release.
-- **Vector benchmark** → `/bench-compare BenchmarkVectorIndex` → uses the `exec:java` form from `pinot-perf/README.md`; it has its own CLI conventions.
-
-**Known quirks:**
-
-- **Stale-jar trap (the real failure mode).** If the current tree's `pinot-perf/target/pinot-perf-pkg/lib/` was built from a previous ref that pulled in different versions of a transitive dep (e.g. `zookeeper-3.9.4.jar` + `zookeeper-3.9.5.jar`), the classpath glob loads both and you get `NoSuchMethodError` at runtime. Pinot/Helix often swallows this as `ZkTimeoutException: Unable to connect to zookeeper server within timeout: 1000`, which looks like an infrastructure/timing issue but is actually a classpath bug. The skill defends against this by always using `./mvnw -pl pinot-perf clean package -DskipTests -am` on the current tree (the worktree is a fresh checkout so baseline is immune). **If you see `ZkTimeoutException` in a second run, don't tune timeouts — check `lib/` for version duplicates.**
-- JMH's `-l` flag doesn't help — Pinot benchmark classes have custom `main()` methods that ignore CLI args. There is no fast sanity check; the first real run is also the first verification.
-- The generated `pinot-.sh` scripts hard-code `-Xms24G -Xmx24G`. Avoid them; use the `java -cp 'lib/*'` form with your own `-Xmx`.
-- Only ~21 of ~60 benchmark classes are configured as appassembler programs — not every benchmark has a `.sh`. Direct `java -cp` works for all of them.
-- Worktrees require a clean `.git`. Abort if rebase/merge is in progress.
-- Benchmarks fork their own JVMs; don't be surprised by multiple `java` processes during the run.
-
----
-
-## `/flaky-analyze`
-
-**What it does.** Queries GitHub Actions via `gh` CLI for recent failing runs of `Pinot Tests`, identifies the failing jobs, downloads the logs, greps for **real** Surefire/TestNG failure markers, clusters by stack trace, and reports a hypothesis.
-
-**Investigation only.** This skill never proposes a fix — the goal is to help the user decide whether the failure is a real bug, a race, or infrastructure noise. Principle C6.2 in `kb/code-review-principles.md` is explicit: the fix is never "add retries".
-
-**The right grep patterns** (documented after burning the wrong ones in testing):
-- `\[ERROR\] Tests run: \d+, Failures: [1-9]` — a Surefire class summary with failures. The FQN follows `-- in `.
-- `<<< FAILURE!` / `<<< ERROR!` — marker following a failed assertion. Line above has the method.
-- `##\[error\]Process completed with exit code` — GitHub Actions' process-level marker.
-- `BUILD FAILURE` — Maven's non-zero exit marker.
-
-Do **not** grep for bare `ERROR` / `FAILED` / `Exception` — Pinot's integration tests log those constantly at runtime (Helix rebalancer, Kafka consumer setup, etc.) and you will drown in noise.
-
-**Example scenarios:**
-
-- **Test is genuinely flaky on master** → skill finds multiple failing runs with the same stack trace → hypothesis: likely real race. Suggests specific source file and line, cites C2.x principles.
-- **Test only fails on one matrix combination** (e.g. JDK 21 integration set 1) → likely env-specific → suggests checking JDK-specific behaviour.
-- **Failing job has no Surefire markers** → infrastructure failure (timeout, OOM, runner cancel). Skill classifies as non-test-level and moves on.
-- **Test name never appears** → either the test isn't actually failing (wrong search term) or GitHub's log retention has aged out the runs. Skill reports which and stops.
-
-**Usage:**
-- `/flaky-analyze RangeIndexTest` — last 20 failing runs of `Pinot Tests`.
-- `/flaky-analyze RangeIndexTest 50` — last 50.
-- `/flaky-analyze RangeIndexTest --pr 18267` — only that PR's runs.
-
-**Known quirks:**
-- `gh run view --log-failed` is unreliable here — it returns only the steps marked failed, which for Pinot's "Integration Test" step is often just runner init lines. Always use `--log` + grep.
-- `gh run view --log` can return tens of MB per job. Cap runs scanned at 50 unless the user asks for more; fetch in parallel where possible.
-- GitHub retains Actions logs for 90 days by default. Older flakes require a different data source (e.g. the `surefire-reports-*` artifacts uploaded on master runs).
-
----
-
-## Related configuration
-
-- [`.claude/agents/code-reviewer.md`](../agents/code-reviewer.md) — independent code-review agent that reads `kb/code-review-principles.md`.
-- [`kb/code-review-principles.md`](../../kb/code-review-principles.md) — 163 Pinot-specific review principles; the skills cite these by `C.`.
-- [`CLAUDE.md`](../../CLAUDE.md) — project-wide instructions consumed by Claude Code.
-- [`AGENTS.md`](../../AGENTS.md) — general agent guidance (for Claude Code and other tools).
-- [`.github/copilot-instructions.md`](../../.github/copilot-instructions.md) — overlapping guidance for GitHub Copilot / Cursor.
-
-## Adding a new skill
-
-1. Create `.claude/skills//SKILL.md`.
-2. Start with YAML frontmatter containing `name` and `description`. Put the ASF license header as `#` comments inside the frontmatter block so the frontmatter stays valid.
-3. Write the procedure in the body as numbered steps — this is what Claude follows. Prefer concrete commands over prose.
-4. Keep the description concrete enough that Claude can decide when to invoke the skill from natural-language requests.
-5. List the skill in the table above and add a detail section below.
-
-Skills should be narrow, fast to read, and composable. A skill that "runs X and then does a code review" probably belongs as two separate skills chained by the user.
-
-## Troubleshooting
-
-- **Skill isn't invoked when expected.** Claude may not have loaded `.claude/skills/` in its session context. In a new session, ask Claude to list available skills, or re-type the slash command explicitly.
-- **Maven wrapper not found.** All skills assume the repo root has `./mvnw`. If you're invoking from a subdirectory, ask Claude to `cd` first, or run from the repo root.
-- **`gh` not authenticated.** `/flaky-analyze` requires `gh auth status` to succeed against github.com. Run `gh auth login` once.
-- **Worktree errors in `/bench-compare`.** Check that `/tmp/pinot-bench-baseline` isn't a leftover from a previous aborted run — if so, `git worktree remove --force` it and retry.
+To add a new skill: write the procedure once at `kb/skills/.md`, then create `.claude/skills//SKILL.md` as a thin pointer with frontmatter. See [`kb/skills/README.md`](../../kb/skills/README.md#adding-a-new-skill) for the full pattern.
diff --git a/.claude/skills/bench-compare/SKILL.md b/.claude/skills/bench-compare/SKILL.md
index ce1650c7c03e..1b79a6f928bb 100644
--- a/.claude/skills/bench-compare/SKILL.md
+++ b/.claude/skills/bench-compare/SKILL.md
@@ -21,87 +21,4 @@ description: Run a Pinot JMH benchmark twice — once on a baseline commit, once
# /bench-compare
-Purpose: when a change claims a performance impact (principle C6.7 — "performance-sensitive changes require benchmark comparisons"), produce the before/after numbers in one command, without making the user manually stash, checkout, run, un-stash, and re-run.
-
-Usage:
-- `/bench-compare BenchmarkDictionary` — compares current working tree vs. `merge-base HEAD upstream/master` (falls back to `origin/master` if upstream missing).
-- `/bench-compare BenchmarkDictionary ` — compare against an explicit ref (commit, tag, branch).
-- `/bench-compare BenchmarkDictionary --args "-wi 1 -i 2 -f 1 -r 5s -w 5s"` — pass extra JMH args. **Always use short warmup/iteration flags for a first pass**; defaults run for hours or days.
-
-**Time expectations.** Pinot benchmarks are not quick. Default JMH config in `pinot-perf` is 8 warmup × 60s + 8 measurement × 60s × 5 forks per parameter combination — a single benchmark method's `@Benchmark` can report an ETA of multiple days. The skill will refuse to run without either: (a) explicit `--args` that reduce warmup/iteration counts, or (b) the user confirming they really do want the full default run.
-
-## Procedure
-
-1. **Locate the benchmark.** Glob for `pinot-perf/**/.java`. If zero or multiple matches, report and stop. The benchmark class must be under `pinot-perf`.
-
-2. **Resolve the baseline ref.**
- - Default: `git merge-base HEAD upstream/master`. If the `upstream` remote isn't defined, fall back to `origin/master`. If neither resolves, ask the user for an explicit ref.
- - If the user passed a ref, validate it with `git rev-parse --verify ][`.
-
-3. **Prepare output directory.** `mkdir -p .bench-compare/` and append it to the repo's `.gitignore` if not already there. Produce two files: `baseline-.txt` and `current-.txt`.
-
-4. **Warn and confirm.** Benchmarks take real time. Inspect `--args` — if the user hasn't passed iteration controls, warn that the default suite can take hours to days and suggest a starter like `-wi 1 -i 2 -f 1 -r 5s -w 5s`. Print an estimate of the pair of runs (rough: a 5s-warmup × 5s-measurement × 1 fork run takes ~30–120s per `@Benchmark` method after the Pinot-side `@Setup` completes; `@Setup` alone can run for 1–10 minutes for benchmarks that build segments). Ask the user to confirm.
-
-5. **Build pinot-perf in a baseline worktree.** This avoids touching the working tree:
- ```
- git worktree add /tmp/pinot-bench-baseline
- (cd /tmp/pinot-bench-baseline && ./mvnw -pl pinot-perf -am package -DskipTests)
- ```
- The package goal produces the jars, an appassembler-generated launcher (for ~21 blessed benchmark classes) at `pinot-perf/target/pinot-perf-pkg/bin/pinot-.sh`, and a fat `lib/` directory.
-
-6. **Run the baseline benchmark.** Two invocation styles, in order of preference:
-
- **Preferred — always use JMH's own Main class:**
- ```
- java -Xms4G -Xmx8G -cp '/tmp/pinot-bench-baseline/pinot-perf/target/pinot-perf-pkg/lib/*' \
- org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
- -wi 1 -i 2 -f 1 -r 5s -w 5s \
- -jvmArgsAppend='-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
- > .bench-compare/baseline-.txt 2>&1
- ```
-
- Why not use the generated `pinot-.sh`?
- - It hard-codes `-Xms24G -Xmx24G` — OOMs on <32GB machines.
- - The benchmark's own `main()` (which the script invokes) typically constructs `OptionsBuilder` directly and **ignores CLI args**, so you can't override warmup/iterations or pass `-jvmArgsAppend`. Going through `org.openjdk.jmh.Main` bypasses the custom main and gets you JMH's standard CLI.
- - The `--add-opens`/`--add-exports` flags are mandatory for any benchmark that extends `BaseClusterIntegrationTest` (i.e., spins up a Pinot cluster) on JDK 21 — without them, ZK startup fails with `InaccessibleObjectException` wrapped as `ExceptionInInitializerError`.
-
- For the vector suite (`BenchmarkVectorIndex`) use the `exec:java` form from `pinot-perf/README.md`; it has its own quirks.
-
-7. **Build and run the current tree.** Two mandatory gotchas:
-
- - **Always clean first:** `./mvnw -pl pinot-perf clean package -DskipTests` (note the `clean`, no `-am` — see next bullet). If `pinot-perf/target/pinot-perf-pkg/` already exists from a prior build on a different branch/ref, incremental `package` leaves stale third-party jars in `lib/` when a dependency version changes upstream. Those stale jars sit on the classpath alongside the new ones (e.g. `zookeeper-3.9.4.jar` and `zookeeper-3.9.5.jar`) and cause `NoSuchMethodError` at runtime. Crucially, Helix/Pinot swallows the resulting `ExceptionInInitializerError` in ZK startup and surfaces a misleading `ZkTimeoutException: timeout: 1000` instead — which looks for all the world like a flaky port or timing issue. If you see that exception, **check `lib/` for duplicate versions of `zookeeper-*`, `helix-*`, `netty-*`, etc. first.**
-
- - **Use `-am` only on the first build.** After the first clean+package, upstream modules are populated; subsequent builds can skip `-am`. The worktree build in step 5 gets a fresh `target/` so doesn't have this problem.
-
- Invocation is identical to step 6, just against the current tree's `lib/*`:
- ```
- ./mvnw -pl pinot-perf clean package -DskipTests -am
- java -Xms4G -Xmx8G -cp 'pinot-perf/target/pinot-perf-pkg/lib/*' \
- org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
- > .bench-compare/current-.txt 2>&1
- ```
-
-8. **Clean up the worktree.** `git worktree remove /tmp/pinot-bench-baseline --force`. Do this even if steps 6 or 7 failed.
-
-9. **Diff the results.** Parse JMH's table output (the `Benchmark ... Score Error Units` lines) from both files. Produce a table:
- ```
- Benchmark Baseline (ops/s) Current (ops/s) Δ Δ%
- foo.methodA 1234.5 ± 12.1 1478.2 ± 15.3 +243.7 +19.7%
- foo.methodB 987.6 ± 8.0 992.1 ± 7.2 +4.5 +0.4%
- ```
- Use "ops/s", "ns/op", or whatever unit JMH emits — don't convert.
-
-10. **Report.** Print the table. Flag any benchmark where `|Δ%| > 2×error%` as likely a real change (otherwise probably noise). Include the paths to the raw files so the user can share them in a PR.
-
-## Notes
-
-- **Stale-jar trap is the #1 source of mysterious failures.** Pinot benchmarks that fail on a subsequent run of `/bench-compare` in the same repo almost always fail because of duplicate third-party jars in `pinot-perf/target/pinot-perf-pkg/lib/` — typically `zookeeper-X.jar` + `zookeeper-Y.jar` (or equivalent for helix, netty, guava) from different builds. The failure shows up as a deeply-wrapped `ZkTimeoutException: Unable to connect to zookeeper server within timeout: 1000` (or similar NoSuchMethodError swallowed into an infrastructure-looking error). First diagnostic when a second run fails: `ls pinot-perf/target/pinot-perf-pkg/lib/ | sort | awk -F- '{v=$NF; sub("\\.jar$","",v); k=$0; sub("-"v"\\.jar$","",k); print k}' | sort | uniq -d` to spot duplicates. The fix is always `./mvnw -pl pinot-perf clean package -DskipTests -am`, never `rm` individual jars.
-- Worktrees require a clean `.git`. If the repo is in the middle of a rebase/merge, abort with a clear message.
-- **JDK 21 needs the full `--add-opens` / `--add-exports` flag set** for any cluster-backed benchmark (extends `BaseClusterIntegrationTest`). Without them, ZK startup fails with `InaccessibleObjectException: ... module java.base does not "opens java.lang"`. Pass via `-jvmArgsAppend=...` to `org.openjdk.jmh.Main`; CI's `pinot_tests.yml` has the canonical list.
-- **The generated `pinot-.sh` scripts hard-code `-Xms24G -Xmx24G`.** Avoid them — use `java -cp 'lib/*' org.openjdk.jmh.Main ` directly with your own `-Xmx`.
-- **Not every benchmark has a generated script.** The appassembler programs list in `pinot-perf/pom.xml` covers ~21 of ~60 benchmark classes. The direct `java -cp` invocation works for any of them.
-- JMH's `-l` (list benchmarks) flag **does not help here** — Pinot benchmark classes have custom `main()` entry points that construct `OptionsBuilder` directly, ignore CLI args, and plunge straight into `Runner.run(...)` which in turn kicks off `@Setup`. For `BenchmarkDictionary` this `@Setup` alone burns 5+ minutes building dictionaries. There is no fast sanity-check short of actually running the benchmark through `org.openjdk.jmh.Main` (which at least accepts `-wi 1 -i 1 -r 1s -w 1s` to minimise it).
-- Benchmarks must run on the same hardware, same JDK, same OS load. Warn the user if they're on battery power or running other heavy processes.
-- Do not `sleep` between runs for "timing" reasons. If a second run fails, it is the stale-jar issue (above), not TIME_WAIT. I spent a long time chasing the timing hypothesis before spotting the classpath mismatch.
-- If the benchmark's output format isn't plain JMH (e.g. `BenchmarkVectorIndex` writes a custom report), don't try to parse it — just save both outputs and tell the user where they are, with a note that manual comparison is needed.
-- Never use `git stash` instead of a worktree. Stash can be lost if the second build fails and the user doesn't know to pop it.
+Procedure: see [`kb/skills/bench-compare.md`](../../../kb/skills/bench-compare.md). Read it first, then follow it.
diff --git a/.claude/skills/flaky-analyze/SKILL.md b/.claude/skills/flaky-analyze/SKILL.md
index 8d96ec5d14ab..83476fdb61fb 100644
--- a/.claude/skills/flaky-analyze/SKILL.md
+++ b/.claude/skills/flaky-analyze/SKILL.md
@@ -21,97 +21,4 @@ description: Pull recent GitHub Actions failures for a Pinot test class and anal
# /flaky-analyze
-Purpose: when a test is failing intermittently on CI, gather the evidence in one place so the user can decide whether it's a real race, an environmental issue, or a legitimate regression. Principle C6.2 is explicit: the fix is never "add retries" — it's understanding the root cause.
-
-Usage:
-- `/flaky-analyze RangeIndexTest` — last ~20 runs on master + PRs.
-- `/flaky-analyze RangeIndexTest 50` — look at 50 runs.
-- `/flaky-analyze RangeIndexTest --pr 18267` — only that PR's runs.
-
-## Prerequisites
-
-1. `gh` CLI installed and authenticated: `gh auth status` must succeed.
-2. Network access to GitHub.
-3. The test must be in `apache/pinot`. If the user's remote is a fork, still query `apache/pinot` — that's where CI lives.
-
-If `gh` isn't available, report that and exit. Don't try to fall back to `curl` with tokens.
-
-## Procedure
-
-1. **Parse the argument.** Extract class name (required), optional run count (default 20), optional PR filter.
-
-2. **Find relevant workflow runs.**
- ```
- gh run list --repo apache/pinot --workflow "Pinot Tests" --status failure --limit --json databaseId,displayTitle,headBranch,headSha,createdAt,url
- ```
- If `--pr` is set, filter with `--branch` to the PR's head branch, or use `gh pr view --json headRefName`.
-
-3. **For each failed run, find the failing jobs.** A "Pinot Tests" run has matrix jobs (test sets 1/2 × java 21 × unit/integration). Only some fail.
- ```
- gh run view --repo apache/pinot --json jobs
- ```
- Filter to jobs with `conclusion: "failure"`.
-
-4. **Download and grep the logs for the target test.** For each failing job:
- ```
- gh run view --job --repo apache/pinot --log
- ```
- Each log line is prefixed by `\tUNKNOWN STEP\t`. The log is large (tens of MB); pipe straight into ripgrep with these patterns — they are what Surefire/TestNG/GitHub Actions actually emit:
-
- - `\[ERROR\] Tests run: \d+, Failures: [1-9]` — the Surefire class-summary line when a test class had failures. The **fully qualified class name** is on the same line after `-- in `.
- - `\[ERROR\] Tests run: \d+, Failures: \d+, Errors: [1-9]` — same, with errors instead of failures.
- - `<<< FAILURE!` / `<<< ERROR!` — the Surefire marker following a failed assertion. Useful to find the exact method; the method and class appear in the line immediately above.
- - `##\[error\]Process completed with exit code` — GitHub Actions' own marker, always present when the job fails for a process-level reason.
- - `BUILD FAILURE` — Maven-level, always present when Maven returns non-zero.
-
- **Do not grep for raw `ERROR` / `FAILED` / `Exception`** — Pinot's integration tests log these constantly at runtime (Helix rebalancer, consumer setup, etc.) and you'll drown in noise. The patterns above only match actual failure markers.
-
- If none of those patterns match in a failing job's log, the test didn't fail at the test level — the job died for infrastructure reasons (timeout, OOM, runner cancel). Classify it as "infrastructure failure" and move on.
-
-5. **Extract structured failure records.** For each hit, record:
- - Run id, PR number (if any), commit SHA, JDK version, test set (parseable from the job name like `Pinot Integration Test Set 1 (temurin-21)`).
- - The failing class FQN from the `-- in ` suffix of the summary line.
- - The failure message (typically the line containing `<<< FAILURE!` or the `AssertionError: ...` line that follows).
- - The top ~5 frames of the stack trace, taken from the ~30 lines following the `<<< FAILURE!` marker.
- - Any preceding log lines that look like test setup problems (timeouts, `Connection refused`, `Address already in use`, `OutOfMemoryError`).
-
-6. **Cluster the failures.** Group by:
- - Identical exception type + top-of-stack frame → likely same root cause.
- - Different stack traces → either multiple bugs or environmental flakiness.
- - Setup/timeout errors with no test code in the stack → likely infrastructure.
-
-7. **Report.** Structure:
- ```
- ## Flaky analysis:
- Runs scanned: N (M with this test failing, K with unrelated failures)
-
- ### Failure cluster 1 — at ( occurrences)
- Example (PR #, JDK 21, test set 2):
-
- Commits affected: ]
-
- ### Failure cluster 2 — ...
-
- ### Hypothesis
-
-
-
- ### Suggested next steps
- -
- ```
-
-8. **Do not propose a fix.** This skill is investigation, not remediation. End with "Want me to open the source file at the top-of-stack frame?"
-
-## Notes
-
-- `gh run view --log` can be slow (10–60s per run) and returns large payloads. Cap total runs scanned at 50 unless the user asks for more. Run these fetches in parallel where possible; `gh` is rate-limited but stays under the limit for <50 runs.
-- Don't write the raw logs to the repo. Stream them through grep and keep only the extracted records in memory.
-- `gh run view --log-failed` is not reliable here — it only returns the steps GitHub marked failed, which for Pinot's "Integration Test" step often just contains runner init lines before the actual Maven invocation. Always use `--log` + the patterns above.
-- To find failing *jobs* within a run without downloading its full log, use:
- ```
- gh api repos/apache/pinot/actions/runs//jobs --jq '.jobs[] | select(.conclusion=="failure") | {name, id: .databaseId}'
- ```
- Then pass the `id` as `--job `. Avoids pulling all matrix logs.
-- If the test doesn't appear in any failure log, report that directly: either the test isn't actually flaky on CI, or the search term is wrong.
-- Results depend on log retention (GitHub keeps 90 days by default). Older flakes are invisible here — suggest checking the `surefire-reports-*` artifacts on master for long-term trends.
-- The `Pinot Tests` workflow (file: `pinot_tests.yml`) is the right default. Also worth checking `Pinot Compatibility Regression Testing` and `Pinot Multi-Stage Query Engine Compatibility Regression Testing` workflows for integration tests that only run there — ask the user before querying those since they multiply the API calls.
+Procedure: see [`kb/skills/flaky-analyze.md`](../../../kb/skills/flaky-analyze.md). Read it first, then follow it.
diff --git a/.claude/skills/precommit/SKILL.md b/.claude/skills/precommit/SKILL.md
index 0490b15a7021..2a155e2dcc9b 100644
--- a/.claude/skills/precommit/SKILL.md
+++ b/.claude/skills/precommit/SKILL.md
@@ -21,110 +21,4 @@ description: Run Pinot's mandatory pre-commit checks (spotless, license, checkst
# /precommit
-Purpose: before pushing a commit or opening a PR, run all quality checks on the modules the current diff actually touches. Don't run them on the whole repo — that's slow and wasteful on a tree this size.
-
-The five checks (in order):
-1. `./mvnw spotless:apply -pl ` — auto-formats code.
-2. `./mvnw license:format -pl ` — adds ASF headers to any new files.
-3. `./mvnw checkstyle:check -pl ` — validates style; fails hard.
-4. `./mvnw license:check -pl ` — validates headers; fails hard.
-5. `./mvnw test-compile -pl -am -Dmaven.compiler.showDeprecation=true -Dmaven.compiler.showWarnings=true '-Dmaven.compiler.compilerArgs=-Xlint:all'` — compiles and checks for deprecation, unchecked casts, raw types, fallthrough, etc. Warnings are filtered to only lines added in the diff.
-
-Steps 1 and 2 are auto-fixers. Steps 3 and 4 are validators — if they fail after the auto-fixers ran, report the failure with the exact offending file/line from the Maven output and stop. Do not try to manually patch style errors; fix the underlying issue or ask the user. Step 5 is a compiler check — if it produces warnings on newly added lines, report them. Prefer the non-deprecated replacement; suppress with `@SuppressWarnings` only with a comment explaining why the deprecated reference is required (e.g., backward-compat serialization, mixed-version SPI calls, testing the deprecated path).
-
-## Procedure
-
-1. **Find changed files.**
- - If the user passed an argument (`staged`, `unstaged`, `branch`, or a path), use that as the scope.
- - Default: union of staged + unstaged files vs. HEAD, plus any added-but-untracked `.java` / `.xml` / `.properties` files.
- - Ignore: `target/`, `node_modules/`, generated sources, `**/*.md`, anything under `pinot-controller/src/main/resources/` (UI) unless the user explicitly asks — those aren't covered by the Maven plugins.
-
-2. **Map files to modules.** For each changed file, walk up the directory tree until a `pom.xml` is found. The first directory containing a `pom.xml` that is *not* the repo root is the module. De-duplicate.
- - If the only pom is the repo root, the user is touching top-level config — just run the checks at the root (no `-pl`).
- - Some plugin modules are nested two levels deep (e.g. `pinot-plugins/pinot-input-format/pinot-parquet`). Don't stop at an intermediate aggregator pom if it doesn't define the actual sources — walk up until you find the module that directly contains the changed file.
-
-3. **Report the plan.** Print the list of detected modules in one line: `Modules: pinot-broker, pinot-common, pinot-plugins/pinot-input-format/pinot-parquet`. If there are no modules, say "No changed Java/XML files — nothing to do." and exit.
-
-4. **Run the auto-fixers.** Build a single `-pl` argument with comma-separated modules:
- ```
- ./mvnw spotless:apply -pl
- ./mvnw license:format -pl
- ```
- Run each in the foreground. Track the number of files modified by each. If either fails with a non-build error (not a style error — those go through checkstyle), stop and surface the error.
-
-5. **Run the validators.**
- ```
- ./mvnw checkstyle:check -pl
- ./mvnw license:check -pl
- ```
- If either fails, parse the Maven output, extract the file:line of each violation, and track them for the summary. Do not attempt to auto-fix checkstyle violations — they need human judgment.
-
-6. **Build the added-line set for compiler warning filtering.** This runs after the auto-fixers so that line numbers reflect the post-fix state (spotless may remove imports, shifting line numbers). Run `git diff --unified=0 HEAD -- ` and parse the `@@` hunk headers to extract the added line ranges. Build a map of `file → set of added line numbers`. For untracked `.java` files (new files not yet in git), `git diff` returns nothing — treat all lines as added (use `wc -l` to get the line count and add 1 through N to the set).
-
-7. **Run the compiler check.**
- ```
- ./mvnw test-compile -pl -am -Dmaven.compiler.showDeprecation=true -Dmaven.compiler.showWarnings=true '-Dmaven.compiler.compilerArgs=-Xlint:all'
- ```
- This is the only step that uses `-am` — compilation needs upstream dependencies built, unlike the other steps. Do not use `clean` — incremental compilation still emits warnings for all files in the module, and the per-line filter (step 6) handles pre-existing warnings. Using `clean` would break modules with generated sources (e.g., JavaCC in `pinot-common`) and adds significant overhead on deep modules. If warnings seem missing (e.g., stale `target/` from a different branch), the user can manually run `./mvnw clean generate-sources test-compile -pl -am ...` to force full recompilation.
-
- Parse the output for `[WARNING]` lines. **Filter to only added lines from the diff** — for each warning of the form `[WARNING] /path/File.java:[line,col] `, check whether that file and line number appear in the added-line set from step 6. Only report warnings that match. This avoids surfacing pre-existing warnings when a contributor edits a file that already has them.
-
- Track each matching warning with file:line and category (deprecation, unchecked, rawtypes, etc.).
-
- For deprecation warnings: prefer the non-deprecated replacement API. If removing the deprecated reference is not feasible (e.g., backward-compat serialization, mixed-version SPI calls, testing the deprecated path), suppress with `@SuppressWarnings("deprecation")` and a comment explaining why.
-
-8. **Print summary report.** Always print the full report, even when all checks pass:
-
- ```
- ## Pre-commit Summary — modules
-
- | Check | Status | Details |
- |------------------|--------|--------------------------------|
- | spotless:apply | FIXED | 3 files reformatted |
- | license:format | OK | 0 files needed headers |
- | checkstyle:check | PASS | |
- | license:check | PASS | |
- | test-compile -Xlint | FAIL | 2 warnings on new lines |
-
- ### Auto-fixed (review before staging)
- - spotless reformatted: File1.java, File2.java
-
- ### Not fixed (requires manual action)
- - `SomeClass.java:45` — [deprecation] Foo.bar() is deprecated, use Foo.baz()
- - `OtherClass.java:12` — [unchecked] unchecked cast to List
- ```
-
- Status values:
- - **FIXED** — auto-fixer modified files (spotless, license:format)
- - **OK** — auto-fixer ran but nothing needed fixing
- - **PASS** — validator passed with no violations
- - **FAIL** — validator or compiler found issues
-
- The report must include:
- - The full table for all 5 checks, every time
- - Every unfixed issue with file:line and what to do about it
- - Every auto-fixed file so the user can review before staging
- - For deprecation: the deprecated API and its replacement (if known)
-
- Do not stage auto-fixed files; that's the user's choice.
-
-## What each step actually enforces
-
-Knowing this matters for diagnosing failures:
-
-- **`spotless:check/apply`**: Pinot's spotless config (see root `pom.xml`) enforces **only two things** — import order (`,\#` → non-static then static) and removal of unused imports. It does **not** enforce trailing whitespace, indentation, brace style, or line length. Don't promise the user that spotless will fix arbitrary formatting.
-- **`license:check/format`**: the ASF header from `HEADER` (repo root), applied to `.java`, `.xml`, `.js`, `.sh`, `.md`, etc. Many file types are excluded — see the `licenseSets/excludes` block in the parent `pom.xml`.
-- **`checkstyle:check`**: rules from `config/checkstyle.xml`. The common ones contributors trip: `LineLength` (120 chars), `AvoidStarImport`, `AvoidStaticImport`, `HideUtilityClassConstructor`, `NeedBraces`. Output format is `[WARNING] :[] () : ` — parse that when surfacing violations.
-- **`license:check`** runs after `license:format` to confirm every touched file now has the header, including files the user only renamed (the plugin keys off content, not git status).
-- **`test-compile -Xlint:all`**: uses `test-compile` (not just `compile`) so both `src/main/` and `src/test/` sources are compiled and all warnings are emitted. Do not use `clean` — it wipes generated sources (e.g., JavaCC in `pinot-common`) and adds significant overhead on deep modules. Incremental compilation still emits warnings for the entire module; the per-line filter handles pre-existing warnings. The Java compiler flags any reference to `@Deprecated` classes/methods/fields from any dependency (Pinot internal or third-party jars), plus unchecked casts, raw types, fallthrough in switch, and other warning categories. Output format: `[WARNING] /path/File.java:[line,col] `. Warnings are filtered to added lines in the diff only — not just by file, but by the specific line numbers from `git diff --unified=0`.
-
-## Notes
-
-- Always use `./mvnw`, never a system `mvn`. The repo's CLAUDE.md is explicit on this.
-- Don't pass `-am` for steps 1–5 — that builds upstream dependencies too, which defeats the purpose of scoping. Only step 7 (compile) needs `-am` because javac needs dependency jars on the classpath.
-- Run sequentially, not in parallel. Spotless and license:format may both modify the same files; ordering matters.
-- When `spotless:apply` removes an unused import, it leaves a *leftover blank line* where the import used to be. This is harmless (checkstyle does not flag it), but if the user cares about the cosmetic double-blank, they'll need to hand-clean after the skill runs. Mention this in the report if spotless touched any files.
-- If the user says `/precommit all`, run on the whole repo (no `-pl`). Warn that this is slow (several minutes).
-- Long builds: steps 1–5 are fast (<30s warm). Step 7 (test-compile with `-am`) is slower (~7–90s depending on module depth and Maven cache state). Deep modules with many upstream deps may take longer on a cold cache. Use `run_in_background` only if the user explicitly asks — otherwise show progress inline.
-- The `license:check` and `checkstyle:check` goals return Maven exit code `1` on violations. If you're capturing the output with shell chaining like `... | tail`, the *tail* pipeline's exit code will mask Maven's — always record Maven's exit code separately, e.g. with `set -o pipefail` or by capturing `${PIPESTATUS[0]}`.
-- The compiler warning filter (step 7) uses the added-line set from step 6, not just the changed-file list. This is critical — per-file filtering would surface pre-existing warnings in files the contributor merely edited, which is unfair. Per-line filtering ensures only warnings on newly added code are reported.
+Procedure: see [`kb/skills/precommit.md`](../../../kb/skills/precommit.md). Read it first, then follow it.
diff --git a/.claude/skills/quickstart/SKILL.md b/.claude/skills/quickstart/SKILL.md
index fb718a48abfd..7d0688b306cf 100644
--- a/.claude/skills/quickstart/SKILL.md
+++ b/.claude/skills/quickstart/SKILL.md
@@ -21,54 +21,4 @@ description: Launch a local Pinot quickstart cluster (batch, hybrid, streaming,
# /quickstart
-Purpose: get a local Pinot cluster up for manual testing or debugging, without the user having to remember where the scripts live or which build profile produces them.
-
-Usage:
-- `/quickstart` — default, runs `quick-start-batch.sh`.
-- `/quickstart batch` — batch mode (offline table with sample data).
-- `/quickstart hybrid` — hybrid table (offline + realtime).
-- `/quickstart streaming` — realtime consumption from an embedded Kafka.
-- `/quickstart upsert-streaming` — upsert table on Kafka.
-- `/quickstart partial-upsert-streaming` — partial-upsert table on Kafka.
-- `/quickstart json-index-batch` / `json-index-streaming` — JSON index demos.
-- `/quickstart complex-type-handling-offline` / `complex-type-handling-streaming` — complex type demos.
-- `/quickstart auth` / `auth-zk` — auth-enabled variants.
-
-## Procedure
-
-1. **Find the quickstart script.** Look in order:
- - `build/bin/quick-start-.sh` (produced by `-Pbin-dist`)
- - `pinot-tools/target/pinot-tools-pkg/bin/quick-start-.sh` (produced by a plain `./mvnw package` of `pinot-tools`)
-
- If neither exists, the binary distribution hasn't been built.
-
-2. **If the script is missing, offer to build.** Ask the user:
- > Quickstart scripts not found. Build them now?
- > - Full bin-dist (recommended for first time): `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` (~10 min)
- > - Just pinot-tools: `./mvnw -pl pinot-tools -am package -DskipTests` (~3 min)
-
- Run the one they pick in the foreground. If they're re-running after a prior build, `build/bin/` should already exist.
-
-3. **Validate the mode.** If the user passed an unrecognized mode, list the available scripts:
- ```
- ls build/bin/quick-start-*.sh 2>/dev/null || ls pinot-tools/target/pinot-tools-pkg/bin/quick-start-*.sh
- ```
-
-4. **Run the script in the background.** Quickstart processes run indefinitely; they're servers. Use `run_in_background` so the user can keep working:
- ```
- build/bin/quick-start-.sh
- ```
- Capture the shell id so the user can check output later.
-
-5. **Report how to use it.** Once started (wait ~30s or until logs show "Pinot Controller started"), tell the user:
- - Controller UI: http://localhost:9000
- - Query console: http://localhost:9000/#/query
- - To stop: kill the background shell (provide the id).
- - Logs: printed to the background shell's stdout.
-
-## Notes
-
-- Quickstart processes are long-running. Do not poll with `sleep` loops. The user can check status via the HTTP UI.
-- Multiple quickstarts can't run simultaneously — they all bind to the same ports (9000, 8000, 7050, 8098, 8099). If a run fails with "Address already in use", check for an existing quickstart and ask the user before killing it.
-- The auth quickstart uses a default admin/verysecret credential; mention this if the user picks `auth` or `auth-zk`.
-- On macOS with JDK 21+, quickstarts may need extra `--add-opens` flags; the shipped scripts handle this already, so don't add flags unless the user reports a module-access error.
+Procedure: see [`kb/skills/quickstart.md`](../../../kb/skills/quickstart.md). Read it first, then follow it.
diff --git a/.claude/skills/review-architecture/SKILL.md b/.claude/skills/review-architecture/SKILL.md
index 0816fd50d8b8..85976bb1c235 100644
--- a/.claude/skills/review-architecture/SKILL.md
+++ b/.claude/skills/review-architecture/SKILL.md
@@ -13,35 +13,4 @@ license: Apache-2.0
# Skill: review-architecture
-You are a specialized reviewer for **Apache Pinot domain 3: Code Architecture & Module Design**. Read `kb/code-review-principles.md` section 3 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — circular dependency across modules; public SPI added without considering backward compat; broker code reaching into server-only internals.
-- **MAJOR** — missing abstract base where >1 implementation exists; duplicated utility logic instead of extending a shared one; plugin pulls non-plugin deps into its module.
-- **MINOR** — package misnamed for its role; helper utility could live one module higher.
-
-## 1. Broad scan
-
-- Moved / renamed classes (check `git diff --find-renames`).
-- New interfaces or abstract classes.
-- New POM `` entries; check the module and the scope.
-- Imports that cross module roots (`pinot-broker` importing `org.apache.pinot.core.query.executor.ServerQueryExecutor`, or `pinot-common` importing `pinot-core`).
-- Two or more implementations of the same SPI: check for a missing abstract base.
-- Any file moved into or out of `pinot-spi/` (binary contract surface).
-
-## 2. Deep analysis
-
-- **C3.x** Confirm module layering: `pinot-spi` → `pinot-common` → `pinot-segment-spi` → `pinot-segment-local` → `pinot-core` → `pinot-query-runtime` / `pinot-broker` / `pinot-server` / `pinot-controller`. Edges must flow one-way.
-- Prefer abstract base over interface-with-default when >1 impl exists and logic is shared (see the AbstractResponseStore pattern).
-- Plugin modules under `pinot-plugins/` must not be depended-on by core code. Verify the direction.
-- New REST resources belong in `pinot-controller` or `pinot-broker`, never cross-wired.
-- Utility classes: if a helper mirrors existing functionality (e.g., a second `PartitionIdUtils`), flag duplication and recommend consolidation.
-- Shaded-jar impacts: flag if a new transitive dep clashes with an existing shaded package.
-
-## 3. Findings
-
-Tag `skill: review-architecture`, cite `C3.x`, use `[BUG-ARCH]` for unnamed structural issues.
-
-## When to defer to the developer
-
-- Intentional one-off utility in a module that explicitly doesn't want a cross-module shared helper; ensure the PR description says so.
+Procedure: see [`kb/skills/review-architecture.md`](../../../kb/skills/review-architecture.md). Read it first, then follow it.
diff --git a/.claude/skills/review-concurrency-state/SKILL.md b/.claude/skills/review-concurrency-state/SKILL.md
index 94199e4ae8ab..59616935486d 100644
--- a/.claude/skills/review-concurrency-state/SKILL.md
+++ b/.claude/skills/review-concurrency-state/SKILL.md
@@ -13,43 +13,4 @@ license: Apache-2.0
# Skill: review-concurrency-state
-You are a specialized reviewer for **Apache Pinot domain 2: State Management & Concurrency**. Read `kb/code-review-principles.md` section 2 and `CLAUDE.md` before analyzing.
-
-Severity:
-- **CRITICAL** — data race, atomicity violation (wipe-before-install), IdealState write without version check, visibility bug on shared mutable state.
-- **MAJOR** — unnecessary lock widening, striped lock without measured contention, check-then-act race even if rare.
-- **MINOR** — over-synchronization, missing `volatile` where `final` would be safer, comment omission on thread-safety contract.
-
-## 1. Broad scan
-
-- Added/removed `synchronized`, `volatile`, `AtomicReference`, `AtomicLong`, `ReentrantLock`, `StampedLock`, `ConcurrentHashMap`, `CopyOnWriteArrayList`.
-- `get` followed by `put` / `remove` on concurrent maps (check-then-act pattern).
-- Helix `IdealState` / `ExternalView` reads without version checks, or `setIdealState` without `dataAccessor.getProperty(...).getStat()`.
-- `@GuardedBy` annotations added or removed.
-- Registration of observers / listeners (callbacks, MetricsRegistry, segment lifecycle listeners) without clear lifetime documentation.
-- Background threads: `Executors.new*`, `ScheduledExecutorService`, `Thread`. Check shutdown path (`awaitTermination` then `shutdownNow`).
-- Consumer / upsert files: `PartitionConsumer`, `UpsertMetadataManager`, `*PartitionUpsertMetadataManager`.
-
-## 2. Deep analysis
-
-For each hit, apply the KB's concurrency principles:
-
-- **C2.1 — Atomic transitions.** Never wipe old metadata before the new state is durably installed. Pattern: prepare-new → swap-reference → cleanup-old. Flag eager deletes.
-- **C2.2 — Thread-safety conservatism.** Default to explicit synchronization. If replacing `synchronized` with `AtomicReference` or `CHM.compute`, verify the visibility story holds across all callers.
-- **C2.3 — Race analysis for lock changes.** When a lock's scope is narrowed or removed, walk through interleavings with other threads that touch the same state. Flag if the walk-through isn't in the PR description.
-- **C2.4 — Version-checked writes.** Shared state in ZK (IdealState, IdealStateConfig, TableConfig, Schema) must be written with optimistic locking (ZK node version); reject blind writes.
-- **C2.5 — Check-then-act on atomics is still racy.** `if (!map.containsKey(k)) map.put(k, v)` is a bug — must be `putIfAbsent` / `computeIfAbsent`.
-- **C2.6 — Shared observers.** When an observer is registered from multiple paths or called concurrently, the handler must be idempotent and its mutable state must be published safely.
-
-Also check lifecycle: every `new ExecutorService` needs a clear shutdown path in `close()` / stop hook.
-
-## 3. Findings
-
-Emit findings in the `code-reviewer` agent format, tagging `skill: review-concurrency-state` and citing `C2.x`. Use `[BUG-RACE]` for unnamed bugs.
-
-For each finding, include a short interleaving sketch (T1/T2 steps) where the race is non-obvious — this is high-leverage and human reviewers trust it.
-
-## When to defer to the developer
-
-- Code clearly documents a single-threaded invariant (e.g., called only from the Helix event thread) and the invariant is preserved.
-- The change is in test-only code.
+Procedure: see [`kb/skills/review-concurrency-state.md`](../../../kb/skills/review-concurrency-state.md). Read it first, then follow it.
diff --git a/.claude/skills/review-config-backcompat/SKILL.md b/.claude/skills/review-config-backcompat/SKILL.md
index dbd9609bb20d..651f92498786 100644
--- a/.claude/skills/review-config-backcompat/SKILL.md
+++ b/.claude/skills/review-config-backcompat/SKILL.md
@@ -14,61 +14,4 @@ license: Apache-2.0
# Skill: review-config-backcompat
-You are a specialized reviewer for **Apache Pinot domain 1: Configuration & Backward Compatibility**. Read `kb/code-review-principles.md` (section "1. Configuration & Backward Compatibility") and `CLAUDE.md` before analyzing.
-
-Severity (from the KB):
-- **CRITICAL** — must fix: removed/renamed config key with no legacy fallback; widened SPI signature; renamed enum/DataType/schema type; Protobuf field-number reuse; DataTable/segment version bump without dual-read.
-- **MAJOR** — should fix: new feature ships ON by default; multi-level override not validated; missing `@Deprecated` on legacy alias.
-- **MINOR** — quality: config namespace inconsistent; constant name mismatches string value; comment misses the rollout plan.
-
-## 1. Broad scan
-
-Grep the diff for risky patterns:
-
-- Removed or renamed constants under `pinot-spi/` and any `*Constants.java`.
-- Method signature changes in files under `pinot-spi/**` or `pinot-segment-spi/**`.
-- New or changed values in any `enum` whose class is serialized (check for `@JsonCreator`, `@JsonValue`, or presence in Helix/ZK/segment metadata).
-- `.proto` / `.thrift` field renumbering or deletion.
-- New REST `@Path` methods or renamed `@JsonProperty` fields.
-- New `@Deprecated` annotations (good) vs. outright deletion of old APIs (bad).
-- New boolean flags in table/cluster config — check their default.
-- Changes to `DataTableBuilder`, `DataTableUtils`, `SegmentVersion`, `V1Constants`.
-
-Short list of hits per pattern.
-
-## 2. Deep analysis
-
-For each hit, apply the trigger match from the KB and compare the change against the BAD/GOOD examples. Specifically check:
-
-- **C1.1** Is the old key still accepted? Is resolution order `new → legacy`? Is the legacy key `@Deprecated`?
-- **C1.2** For new enum values / schema DataType: has the name been validated against SQL / Parquet / Arrow conventions? Names are permanent.
-- **C1.3** For SPI signature changes: could an existing plugin compiled against an older version still link? Prefer overloads over widening.
-- **C1.4** For reverts: does the commit reference the original PR and explain the failure mode?
-- **C1.5** For new `isXxxEnabled()`-style validation: does it resolve through table → instance → default override chain? Use the `Enablement` enum where it exists.
-- **C1.6** New feature flag defaults to OFF (`false` for `enableXxx`, or `false` for `disableXxx` = enabled).
-- **C1.7** Config namespace follows existing patterns (`pinot.broker.*`, `pinot.query.sse.*`, dot-separated lowercase).
-- Wire format: DataTable / segment-version bumps must keep the reader able to decode prior versions; confirm dual-read is tested.
-- Rolling upgrade: is there a written rolling-upgrade note for backward-incompat label PRs (broker-first vs controller-first)?
-
-If the diff doesn't match any trigger in this domain, return `no-findings`.
-
-## 3. Findings
-
-Emit each finding in the format below, matching the `code-reviewer` agent's output:
-
-```
-### [C1.x] — CRITICAL|MAJOR|MINOR
-**File:** `path/to/File.java:line`
-**Trigger:**
-**Problem:**
-**Fix:**
-```
-
-For issues not matching a specific principle but still in this domain, use `[BUG-CFG]`.
-
-Tag each finding with `skill: review-config-backcompat`. Include `also-see` pointers to related principles when applicable (e.g., C1.5 and C1.8 often co-occur).
-
-## When to defer to the developer
-
-- Renames inside a purely internal package (no JSON / SPI / serialization exposure).
-- Adding a legacy alias is technically possible but the original key has not been released yet (check `git log` for first introduction).
+Procedure: see [`kb/skills/review-config-backcompat.md`](../../../kb/skills/review-config-backcompat.md). Read it first, then follow it.
diff --git a/.claude/skills/review-correctness-nulls/SKILL.md b/.claude/skills/review-correctness-nulls/SKILL.md
index 28c45a61f4a8..b4a330de8728 100644
--- a/.claude/skills/review-correctness-nulls/SKILL.md
+++ b/.claude/skills/review-correctness-nulls/SKILL.md
@@ -12,37 +12,4 @@ license: Apache-2.0
# Skill: review-correctness-nulls
-You are a specialized reviewer for **Apache Pinot domain 5: Correctness & Safety**. Read `kb/code-review-principles.md` section 5 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — silent wrong results (precision loss, missing enum case handled as default), memory leak on segment destroy, null dereference that crashes a query, missing null-bitmap update.
-- **MAJOR** — use of double fallback where polymorphic primitive dispatch exists; exception swallowed without logging; unchecked `Optional.get()`.
-- **MINOR** — `@Nullable` missing on a return that may be null; redundant null check.
-
-## 1. Broad scan
-
-- Return paths from stats / aggregators / dictionaries that may be empty — confirm they handle "no data" case (return null, not NPE).
-- `switch` on `DataType`, `FieldSpec.DataType`, `ColumnDataType`, `IndexType`, `SegmentVersion` — check for missing cases and `default: throw` vs. silent fallthrough.
-- `null` returns from new methods — confirm `@Nullable` is on the signature.
-- `close()` / `destroy()` methods — confirm they clear indexes, bitmaps, dictionaries, and null all references.
-- Arithmetic / aggregation code — look for unconditional cast-to-double or use of `BigDecimal` when `long` suffices.
-- Window / aggregation functions split by type — confirm INT/LONG/FLOAT/DOUBLE/BIG_DECIMAL each have a dedicated impl where precision matters.
-- Caught `Throwable` / `Exception` — confirm no silent swallow.
-
-## 2. Deep analysis
-
-- **C5.x** Null handling: dispatch on `getStoredType()`, update null-value-vector bitmap on insert/delete, test both `null-handling-enabled=true` and `false` paths.
-- **Exhaustive switches**: prefer `EnumSet.allOf` confirmation or a `default: throw new IllegalArgumentException("Unsupported " + type)`; silent fallthrough is a CRITICAL risk (see PR 18176 `IVF_ON_DISK` case).
-- **Precision**: INT/LONG windows must not coerce to double; BIG_DECIMAL requires its own aggregator. Flag coercions that may lose precision past 2^53.
-- **Resource cleanup**: `close()` must actively trim state, not rely on GC. Even if dangling refs are rare, explicit cleanup is the norm (see PR 18204 bitmap leak).
-- **Error messages**: `Preconditions.checkState`, `IllegalArgumentException`, `IllegalStateException` must include the offending value — not opaque.
-- **Off-by-one**: row iteration loops — confirm `< numDocs` not `<=`; confirm `docIdIterator` drains fully before reusing.
-
-## 3. Findings
-
-Tag `skill: review-correctness-nulls`, cite `C5.x`, use `[BUG-CORR]` for unnamed bugs. For precision / silent-wrong-result risks, always classify CRITICAL.
-
-## When to defer to the developer
-
-- Null path is explicitly out-of-scope in the PR description and a follow-up issue is linked.
-- `default` branch falls through to a documented best-effort path (rare; must be justified).
+Procedure: see [`kb/skills/review-correctness-nulls.md`](../../../kb/skills/review-correctness-nulls.md). Read it first, then follow it.
diff --git a/.claude/skills/review-naming-api/SKILL.md b/.claude/skills/review-naming-api/SKILL.md
index 715af7ab0008..b414ca38ffa3 100644
--- a/.claude/skills/review-naming-api/SKILL.md
+++ b/.claude/skills/review-naming-api/SKILL.md
@@ -12,36 +12,4 @@ license: Apache-2.0
# Skill: review-naming-api
-You are a specialized reviewer for **Apache Pinot domain 7: Naming & API Design**. Read `kb/code-review-principles.md` section 7 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — enum constant / DataType / schema type name inconsistent with SQL/Parquet/Arrow (names are permanent — see C1.2); public API with same-module name collision.
-- **MAJOR** — public class missing Javadoc; method name misrepresents behavior (e.g., `get` that mutates); inconsistent REST naming vs. existing resources.
-- **MINOR** — style — variable naming, inline FQCNs that should be imports, trailing "Helper" / "Util" suffix where a better noun exists.
-
-## 1. Broad scan
-
-- New / renamed public classes, interfaces, methods, fields.
-- New enum values — check against SQL / Parquet / Arrow conventions (see C1.2).
-- New `@Path` routes or `@JsonProperty` names — confirm kebab-case for URL, camelCase for JSON, consistent with neighbors.
-- Inline fully-qualified class names — flag (CLAUDE.md convention).
-- New public classes without class-level Javadoc — flag (CLAUDE.md convention).
-- License headers on new files — flag missing.
-
-## 2. Deep analysis
-
-- **C7.x** Confirm name matches behavior. `get*` should not mutate. `isXxx` / `hasXxx` for booleans. `toXxx` / `fromXxx` for conversions.
-- Consistency: compare the new name against ≥3 neighbors in the same package / module.
-- Public API surface: if adding a method to an SPI interface, confirm domain-1 backward-compat story (C1.3).
-- Javadoc: new public classes must describe behavior and thread-safety.
-- Imports: `com.foo.Bar foo = new com.foo.Bar()` → use import.
-- CLAUDE.md checks: license header, Java 21 target (Java 11 bytecode for SPI/client modules), SLF4J logger pattern.
-
-## 3. Findings
-
-Tag `skill: review-naming-api`, cite `C7.x`, use `[CONV]` for CLAUDE.md violations. Most findings here are MINOR; do not inflate severity.
-
-## When to defer to the developer
-
-- Name is internal-only (package-private) and the team has indicated preference.
-- A rename would force a larger refactor already deferred to a follow-up.
+Procedure: see [`kb/skills/review-naming-api.md`](../../../kb/skills/review-naming-api.md). Read it first, then follow it.
diff --git a/.claude/skills/review-performance/SKILL.md b/.claude/skills/review-performance/SKILL.md
index f6f45b076482..717a8d09960b 100644
--- a/.claude/skills/review-performance/SKILL.md
+++ b/.claude/skills/review-performance/SKILL.md
@@ -13,36 +13,4 @@ license: Apache-2.0
# Skill: review-performance
-You are a specialized reviewer for **Apache Pinot domain 4: Performance & Efficiency**. Read `kb/code-review-principles.md` section 4 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — documented benchmark regresses significantly; per-row allocation in top-level operator loop; large synchronized block on query-path singleton.
-- **MAJOR** — missing primitive fast-path (e.g., reusing `Long` boxing); unnecessary intermediate collection on the scan path; string concat inside a per-row loop.
-- **MINOR** — minor inefficiency off the hot path; logging at INFO inside a tight loop.
-
-## 1. Broad scan
-
-- Per-row methods: search for `getInt`, `getLong`, `getDouble`, `getString`, `getBytes`, `getValue`, `transform`, `filter`, `accept` inside operator / transform / aggregator files.
-- Allocations in loops: `new `, `Arrays.asList`, `Collections.singletonList`, `String.format`, `"x" + y`, lambdas capturing variables.
-- Boxing: use of `Integer`, `Long`, `Double`, `Boolean` where `int`, `long`, `double`, `boolean` would do; `Map`-style in hot code.
-- Virtual dispatch in hot loops: fields typed as an interface where a concrete class would let JIT inline.
-- `synchronized` / lock acquisition inside for-loops on the scan path.
-- Missing type-specific aggregator (e.g., `Sum` falls back to `BigDecimal` when `Long` would suffice — see recent PRs on `SumLongWindowValueAggregator`).
-- `ByteBuffer.duplicate()` / `slice()` in loops.
-
-## 2. Deep analysis
-
-- **C4.x** Ask: does this code run per-row, per-segment, or per-query? Apply the budget of each (per-row: zero allocations, no boxing, no virtual dispatch where avoidable).
-- Check for type-dispatch on `getStoredType()` rather than a double-coercion fallback. Precision loss past 2^53 is a correctness issue but also a perf giveaway (extra unbox + cast).
-- If the PR claims a perf gain, confirm a benchmark is attached or point to the `/bench-compare` skill as a next step.
-- If a benchmark shows a regression > 5%, flag CRITICAL regardless of other merits.
-- Avoid introducing `LOGGER.debug(String.format(...))` in per-row loops — even when debug is off, formatting may be eager.
-
-## 3. Findings
-
-Tag `skill: review-performance`, cite `C4.x`, use `[BUG-PERF]`. Quantify when possible ("allocation per row × 1M rows/sec = 1M objects/sec"). Recommend `/bench-compare ` when the impact is unclear.
-
-## When to defer to the developer
-
-- Change is behind a rarely-used feature flag and the PR acknowledges the trade-off.
-- The hot path has a documented JIT-inlining assumption and the change preserves it.
+Procedure: see [`kb/skills/review-performance.md`](../../../kb/skills/review-performance.md). Read it first, then follow it.
diff --git a/.claude/skills/review-process-scope/SKILL.md b/.claude/skills/review-process-scope/SKILL.md
index 983ce701e416..e46567f28196 100644
--- a/.claude/skills/review-process-scope/SKILL.md
+++ b/.claude/skills/review-process-scope/SKILL.md
@@ -13,34 +13,4 @@ license: Apache-2.0
# Skill: review-process-scope
-You are a specialized reviewer for **Apache Pinot domain 8: Process & Scope**. Read `kb/code-review-principles.md` section 8 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — revert without referencing the original PR or explaining the regression; test-retry / sleep added to mask a flake (never fix by retry — investigate root cause); missing rolling-upgrade note on a backward-incompat change.
-- **MAJOR** — PR bundles multiple unrelated concerns; commit message doesn't explain WHY; new TODO with no issue link.
-- **MINOR** — PR title style; label missing.
-
-## 1. Broad scan
-
-- Diff size + module count. If > 500 lines or > 4 modules, flag for scope review.
-- Commit messages (`git log ..HEAD`): check each for a WHY clause.
-- New `// TODO` / `// FIXME` — confirm each has a linked issue.
-- Test retry patterns: `@Test(retryAnalyzer = ...)`, `Thread.sleep` added in tests, `@Flaky` annotations.
-- PR title / labels if available.
-
-## 2. Deep analysis
-
-- **C8.x** PR scope: one concern per PR; bundle refactor + test rewrite (that's acceptable) but not refactor + feature.
-- **Reverts**: must name the reverted PR number and the reason.
-- **No-retry rule**: flaky tests are investigated via `/flaky-analyze`, not retried into submission.
-- **Backward-incompat labeling**: any change touching wire formats / APIs / configs that can't roll-forward-and-back needs the `backward-incompat` label and a rolling-upgrade note.
-- **TODO hygiene**: every TODO links to an issue.
-
-## 3. Findings
-
-Tag `skill: review-process-scope`, cite `C8.x`, use `[PROC]` for process nits. Most findings MINOR; the retry-to-fix-flake and revert-without-reference cases are CRITICAL.
-
-## When to defer to the developer
-
-- PR is pre-coordinated large refactor with a linked design doc; scope is justified.
-- Label was set after PR description was drafted; confirm it's now correct.
+Procedure: see [`kb/skills/review-process-scope.md`](../../../kb/skills/review-process-scope.md). Read it first, then follow it.
diff --git a/.claude/skills/review-testing/SKILL.md b/.claude/skills/review-testing/SKILL.md
index 5e0782decd80..6f996555b4dc 100644
--- a/.claude/skills/review-testing/SKILL.md
+++ b/.claude/skills/review-testing/SKILL.md
@@ -12,61 +12,4 @@ license: Apache-2.0
# Skill: review-testing
-You are a specialized reviewer for **Apache Pinot domain 6: Testing Strategies**. Read `kb/code-review-principles.md` section 6 and `CLAUDE.md`.
-
-Severity:
-- **CRITICAL** — bug-fix PR with no regression test; wire-format change with no mixed-version test; null-aware change that tests only one of the two null-handling modes.
-- **MAJOR** — positive-only test (no negative / error-case); mock-heavy test where a real dictionary/segment is trivially available; missing test for a new public code path; new integration test spins up its own cluster when `CustomDataQueryClusterIntegrationTest` would suffice.
-- **MINOR** — assertion style (`assertTrue(x == y)` vs `assertEquals`); unclear test names; flaky-prone timing assumptions.
-
-## 1. Broad scan
-
-- For every file under `src/main/` changed, check if a corresponding `src/test/` file also changed. Missing = suspect.
-- Find new tests and scan for:
- - Test framework: TestNG (`import org.testng.annotations.Test`) unless the file uses JUnit consistently.
- - Mocks: `@Mock`, `Mockito.when`, `mock(...)`. Flag mocks of `Dictionary`, `ForwardIndexReader`, `NullValueVectorReader` — these should be real in most cases (see PR 18189).
- - `assertTrue` / `assertFalse` on compound expressions — prefer `assertEquals` / `assertThrows`.
- - Missing `@Test(dataProvider=...)` when the production code has a type-dispatch switch — a single type is insufficient coverage.
- - Timing assumptions: `Thread.sleep`, `System.currentTimeMillis()` in assertions → flakiness.
-- Integration tests: new REST / Thrift / wire-format changes need a `*IntegrationTest` case; check `pinot-integration-tests/`.
-- Integration-test base class: reject new standalone Pinot integration test classes that use `BaseClusterIntegrationTest`
- or spin up their own controller/broker/server when no special cluster setup is required. For ordinary schema/data/query
- behavior, require reusing an existing `CustomDataQueryClusterIntegrationTest` test or adding a focused subclass under
- `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` with
- `@Test(suiteName = "CustomClusterIntegrationTest")`, so it is included by
- `pinot-integration-tests/src/test/resources/custom-cluster-integration-test-suite.xml`.
-
-## 2. Deep analysis
-
-- **C6.x** Positive + negative: every new feature needs at least one passing and one rejected-input test.
-- **Real dependencies where semantics matter**: dictionaries, segment readers, null-vector readers, aggregators, transform functions — use real instances. Mocks hide encoding-sensitive bugs.
-- **Null-handling coverage**: any change under null-aware code must test both `null-handling-enabled=true` and `false`.
-- **Type coverage**: for aggregator / window / transform changes, test INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING, BYTES, BOOLEAN, TIMESTAMP, JSON as applicable.
-- **Mixed-version tests**: for wire-format, Helix, or controller-API changes, confirm a rolling-upgrade scenario is exercised (controller-old + broker-new, and vice-versa).
-- **Regression evidence**: bug-fix PRs must include a test that fails on `HEAD~1` and passes on `HEAD`. Flag absence.
-- **Flakiness hygiene**: never add a `Thread.sleep` as a test-stability knob; prefer `Awaitility` / explicit events. Do not mask flakes with retries (see domain 8 process rule).
-
-### Core-functionality + integration-test base-class selection
-
-Every non-trivial change must exercise the **core functionality** it introduces. For query-semantics changes (new function, index type, aggregator, transform, SQL construct, stored-type behavior) that can be validated with ordinary table data and the default cluster topology, the integration test **must** reuse an existing `CustomDataQueryClusterIntegrationTest` test or add a focused subclass under `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` — not a fresh `BaseClusterIntegrationTest` subclass.
-
-Rationale: `CustomDataQueryClusterIntegrationTest` shares one controller / broker / server / ZK across the whole test suite (`@BeforeSuite`, `_sharedClusterTestSuite`). Spinning up a second cluster per test costs ~30–60 s of ZK/Helix startup and inflates CI time linearly with test count. The custom base lets each test bring its own schema, data, and SQL assertions on top of the shared cluster. Nearly all feature validation (window functions, sketches, vector indexes, geo, JSON, timestamp, bytes, distinct, group-by options, star-tree, unnest, SSB queries, etc.) already follows this pattern — look for neighbors in `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` before adding a new top-level cluster test.
-
-A new test may extend `BaseClusterIntegrationTest` (or a specialized base) **only** when the change demands one of:
-- Different cluster topology (multi-tenant, multi-broker, multi-server, dedicated minion).
-- Non-default Pinot component configuration (custom broker / server / controller properties, auth, TLS, access control).
-- Different Helix / ZK layout or tenant isolation that can't be simulated with table-level config.
-- Realtime/streaming wiring that the custom base does not already provide, or lifecycle transitions that require cluster restart.
-
-Flag as **MAJOR / C6.10**: a new test under `pinot-integration-tests/` whose class body only defines schema, data, and SQL assertions but extends `BaseClusterIntegrationTest` directly or otherwise creates a dedicated cluster. The fix is to re-parent to `CustomDataQueryClusterIntegrationTest`, move it into the `custom/` package, annotate it with `@Test(suiteName = "CustomClusterIntegrationTest")`, and ensure it is covered by `custom-cluster-integration-test-suite.xml`. Require the PR to state, in the description, which of the four "only-when" conditions above applies if the fresh cluster is justified.
-
-Independently, flag as **CRITICAL** a change to core functionality (new SQL function, new index type, behavior change of an existing operator/aggregator) that ships with only unit tests and no integration test of any kind — unit coverage alone doesn't prove the feature works end-to-end through planner → broker → server → segment.
-
-## 3. Findings
-
-Tag `skill: review-testing`, cite `C6.x`, use `[BUG-TEST]`. When flagging missing tests, always suggest a concrete test signature / data-provider shape.
-
-## When to defer to the developer
-
-- PR is a pure rename / move with no behavior change and existing tests still exercise the paths.
-- New code is test-only scaffolding.
+Procedure: see [`kb/skills/review-testing.md`](../../../kb/skills/review-testing.md). Read it first, then follow it.
diff --git a/.claude/skills/run-test/SKILL.md b/.claude/skills/run-test/SKILL.md
index e35f7fc9a349..07d7da995fb7 100644
--- a/.claude/skills/run-test/SKILL.md
+++ b/.claude/skills/run-test/SKILL.md
@@ -21,47 +21,4 @@ description: Run a single Pinot JUnit/TestNG test class by name. Auto-detects th
# /run-test
-Purpose: resolve a test class name to its Maven module and run only that test, without the user having to remember the exact `-pl`, `-am`, `-Dtest`, and `-Dsurefire.failIfNoSpecifiedTests` flags.
-
-Usage:
-- `/run-test RangeIndexTest` — single class.
-- `/run-test RangeIndexTest#testSpecificMethod` — single method.
-- `/run-test OfflineClusterIntegrationTest` — integration test (auto-detected, adds the required flag).
-
-## Procedure
-
-1. **Parse the argument.** Split on `#` into `` and optional ``. If the class name contains a dot, treat it as FQN.
-
-2. **Locate the source file.**
- - Glob for `**/.java` under the repo.
- - Prefer matches under `src/test/java/`.
- - If multiple matches, list them (with module prefixes) and ask the user which one. Do not guess.
- - If zero matches, report and stop.
-
-3. **Find the owning module.** Walk up from the test file until you find a `pom.xml` that is not the repo root. That's the module.
-
-4. **Note on the `failIfNoSpecifiedTests` flag.** Always pass `-Dsurefire.failIfNoSpecifiedTests=false`, regardless of whether the target is a unit or integration test. With `-am`, Maven runs the full Surefire goal on every upstream module (e.g. `pinot-spi` → `pinot-common` → … → target module). Each of those modules invokes Surefire with the same `-Dtest=` filter, and Surefire's default behaviour is to **fail the whole build** when the pattern doesn't match any test in a given module. Without this flag, the build dies at the first upstream module that doesn't happen to contain ``. This applies equally to unit tests (upstream modules don't have the test) and integration tests (the `pinot-integration-tests` module has tests the filter doesn't match).
-
- Optional: detect integration tests for reporting/warnings only. A test is an integration test if *any* of these hold:
- - The file path contains `pinot-integration-tests`.
- - The file is named `*IntegrationTest.java`, `*IT.java`, `*ClusterTest.java`, or `*EndToEndTest.java`.
- - The module is `pinot-integration-tests` or `pinot-compatibility-verifier`.
-
- Use this only to warn the user about expected runtime ("integration tests typically take 10–20 min"), not to alter the command.
-
-5. **Build the command.**
- ```
- ./mvnw -pl -am -Dtest=[#] -Dsurefire.failIfNoSpecifiedTests=false test
- ```
- - `-am` is intentional: the test needs upstream module JARs built.
- - `-Dsurefire.failIfNoSpecifiedTests=false` is always required when `-am` is set (see step 4).
-
-6. **Run and report.** Print the exact command before running so the user can copy/tweak it. On failure, show the last ~60 lines of the Maven output (or the Surefire report path under `/target/surefire-reports/`) so the user can jump straight to the stack trace.
-
-## Notes
-
-- These runs can take 2–15 minutes depending on the module and whether deps are already built. Consider `run_in_background` only if the user says so — default is foreground so they see progress.
-- Never strip `-am`. The first run after a clean checkout will fail without it.
-- If the user wants to run without rebuilding upstream (faster iteration), suggest they add `-o` (offline) or drop `-am` after the first successful build — but don't do it automatically.
-- For repeat runs of the same test, suggest `-DfailIfNoTests=false` if the first run reported "No tests were executed" — usually a typo in the class name.
-- If the class is `abstract` or has no `@Test` methods (it's a base class), warn the user and suggest concrete subclasses found via grep.
+Procedure: see [`kb/skills/run-test.md`](../../../kb/skills/run-test.md). Read it first, then follow it.
diff --git a/.gitignore b/.gitignore
index 1ee1ec2ddd74..7250804e1e16 100644
--- a/.gitignore
+++ b/.gitignore
@@ -46,6 +46,7 @@ yarn-error.log*
#quickstart files
quickstart*
!.claude/skills/quickstart/
+!kb/skills/quickstart.md
#build symlink directory
build
diff --git a/AGENTS.md b/AGENTS.md
index 725c2b301a63..306a5d398db8 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -137,3 +137,32 @@ Do not push until all four checks pass cleanly.
## Reference docs
- `README.md` for build and quickstart details.
- `CONTRIBUTING.md` for style, licensing, and contribution guidance.
+
+## Knowledge base (tool-neutral)
+The `kb/` directory holds AI-optimized procedures and reference material that any
+coding agent (Claude Code, Copilot, Cursor, GPT, Qwen, Gemini, etc.) can read.
+Claude Code's `.claude/skills//SKILL.md` and `.claude/agents/.md`
+files are thin pointers that delegate to the kb/ procedures — non-Claude agents
+should read kb/ directly.
+
+- `kb/skills/` — operational procedures and review checklists. See
+ [`kb/skills/README.md`](kb/skills/README.md) for the index. Each file is
+ self-contained; read it and follow it when your task matches the skill name.
+ - Operations: `precommit`, `run-test`, `quickstart`, `bench-compare`,
+ `flaky-analyze`.
+ - Review (eight domains, one per file): `review-config-backcompat`,
+ `review-concurrency-state`, `review-architecture`, `review-performance`,
+ `review-correctness-nulls`, `review-testing`, `review-naming-api`,
+ `review-process-scope`.
+- `kb/agents/code-reviewer.md` — orchestrator that dispatches the eight review
+ skills in parallel, aggregates findings, and emits a consolidated severity-
+ ranked report.
+- `kb/code-review-principles.md` — Pinot-specific review principles cited by id
+ (e.g. `C2.4`, `C6.1`) from the review skills.
+- `kb/CLAUDE.md` — kb/ authoring rules (one source of truth, terse, AI-optimized).
+
+**For non-Claude agents:** when a task matches a skill name (e.g. user asks for
+a pre-commit check, a benchmark comparison, a flaky-test investigation, or a
+code review), read the corresponding `kb/skills/.md` and follow its
+procedure. For a full code review, read `kb/agents/code-reviewer.md` and run the
+eight review skills as it describes.
diff --git a/kb/agents/code-reviewer.md b/kb/agents/code-reviewer.md
new file mode 100644
index 000000000000..8b38a18d7940
--- /dev/null
+++ b/kb/agents/code-reviewer.md
@@ -0,0 +1,108 @@
+# code-reviewer
+
+You are the orchestrator of a multi-agent code review for Apache Pinot. You do NOT review code yourself — you delegate to domain-specialized sub-reviewers in parallel and aggregate their findings.
+
+Reference: the Anthropic multi-agent review pattern (different agents examine different defect classes; findings are merged into the consolidated review).
+
+**Independence rule:** You will receive only a scope (what to review) and a one-line change description. If the caller passes opinions, analysis, or concerns about the code, ignore them entirely. Your job is to be an independent second pair of eyes, not to confirm someone else's assessment.
+
+## Inputs you accept
+
+- `scope` — what to review (default: `git diff` of unstaged changes; may be a commit range, branch diff, or explicit file list).
+- `change_description` — one line from the caller.
+
+Ignore any additional opinions, analysis, or concerns from the caller.
+
+## Before dispatching
+
+1. Resolve the scope into a concrete diff. Record: file list, hunk count, total changed lines, modules touched.
+2. Read `kb/code-review-principles.md` and `CLAUDE.md` once; keep them in context.
+3. Skip sub-reviewers whose domain is clearly irrelevant (e.g., `review-performance` can be skipped for a pure doc change). Default: dispatch all 8.
+
+## Dispatch — in parallel
+
+Spawn one sub-agent per applicable skill, in a single parallel batch. Each sub-agent receives:
+
+- `scope` (verbatim)
+- `change_description` (verbatim)
+- `skill` — one of:
+ - `review-config-backcompat` — KB domain 1 (Configuration & Backward Compatibility)
+ - `review-concurrency-state` — KB domain 2 (State Management & Concurrency)
+ - `review-architecture` — KB domain 3 (Code Architecture & Module Design)
+ - `review-performance` — KB domain 4 (Performance & Efficiency)
+ - `review-correctness-nulls` — KB domain 5 (Correctness & Safety)
+ - `review-testing` — KB domain 6 (Testing Strategies)
+ - `review-naming-api` — KB domain 7 (Naming & API Design)
+ - `review-process-scope` — KB domain 8 (Process & Scope)
+
+Each sub-agent reads the skill's body (in `kb/skills/.md`), performs its 3-phase analysis (broad scan → deep analysis → findings), and returns a structured list of findings. Each finding uses this format:
+
+```
+### [C{id}] — CRITICAL|MAJOR|MINOR
+**File:** `path/to/File.java:line`
+**Trigger:**
+**Problem:**
+**Fix:**
+```
+
+Use `[BUG]` for bugs not covered by a specific principle, `[CONV]` for CLAUDE.md convention violations, and domain-tagged variants (`[BUG-CFG]`, `[BUG-RACE]`, `[BUG-ARCH]`, `[BUG-PERF]`, `[BUG-CORR]`, `[BUG-TEST]`, `[PROC]`) where the skills define them.
+
+## Severity hierarchy
+
+Classify each finding using the tiers defined in the principles doc:
+
+- **CRITICAL**: Must fix before merge. Data loss, corruption, silent wrong results, backward incompatibility, security, race conditions.
+- **MAJOR**: Should fix. Strong justification needed to skip. Performance regressions, design violations, missing tests, wrong abstractions.
+- **MINOR**: Improves quality. Acceptable to defer. Naming, style, idioms, process suggestions.
+
+Priority order when principles collide: Production Safety > Backward Compatibility > Correctness > State Management > Performance > Architecture > Testing > Naming > Process.
+
+## Aggregate
+
+1. Collect all findings into one list.
+2. **De-duplicate** by the key `(principle_id || "BUG:"+one_line_problem, file, line_range_overlap)`. When two sub-reviewers flag the same issue, keep the one with the higher severity and append `also-flagged-by: ` to the record.
+3. **Resolve conflicts** — if two skills disagree on severity, take the higher tier and note the disagreement in a `notes` field so the human reviewer can weigh in.
+4. **Sort** by severity (CRITICAL → MAJOR → MINOR), then by file, then by line.
+5. **Cap noise** — if more than 15 MINOR findings accumulate, summarize them in one "Style / nits" section rather than listing each.
+
+## Output
+
+Start by listing what you're reviewing (files, diff summary, dispatched sub-reviewers). Then emit a consolidated report in this shape:
+
+```
+## Review scope
+- Files: N, Lines: +X/-Y, Modules: m1, m2
+- Sub-reviewers dispatched: 8 (or list if fewer)
+
+## CRITICAL (must fix before merge)
+### [] — CRITICAL
+**File:** `path:line`
+**Raised by:** review- (also-flagged-by: …)
+**Trigger:** …
+**Problem:** …
+**Fix:** …
+
+## MAJOR (should fix)
+…
+
+## MINOR / nits
+- `path:line` —
+…
+
+## Summary
+- Count by severity
+- Domains with zero findings (so the author can see what was checked)
+- Any inter-skill disagreements flagged for the human reviewer
+```
+
+If no issues are found across all dispatched sub-reviewers, confirm the code meets standards with a brief summary noting which domains were checked.
+
+## Discipline
+
+- **Never add findings of your own.** You only aggregate. If you notice something the sub-reviewers missed, dispatch the relevant skill again; do not invent findings here.
+- **Never downgrade severity.** If sub-reviewers say CRITICAL, the consolidated report says CRITICAL.
+- **Trigger matching is mandatory.** Sub-reviewers only apply principles whose trigger conditions match the diff. A sub-reviewer that returns "no applicable principles in this domain" is a valid, reassuring result — record it in the summary.
+- **Cite the most severe principle** when a finding matches multiple rules.
+- **Severity accuracy is paramount.** A MINOR issue classified as CRITICAL erodes trust just as much as a missed CRITICAL.
+- **Quality over quantity.** A review with 2 real findings beats one with 10 marginal ones.
+- **If a sub-reviewer errors**, record the failure in the summary and fall back to reporting the surviving sub-reviewers' findings. Do not silently drop a domain.
diff --git a/kb/skills/README.md b/kb/skills/README.md
new file mode 100644
index 000000000000..d171e51e25e3
--- /dev/null
+++ b/kb/skills/README.md
@@ -0,0 +1,207 @@
+# Pinot agent skills
+
+Tool-neutral procedural runbooks for common Pinot developer workflows. Each skill is a single Markdown file describing a procedure; any agent (Claude Code, Cursor, Continue, OpenAI Codex CLI, Qwen Coder CLI, etc.) can read and follow them.
+
+Claude Code additionally exposes each skill as a slash command (`/`) via thin entry-point files under `.claude/skills//SKILL.md` — those files just contain frontmatter and a pointer back to the body in this directory. Other tools should read `kb/skills/.md` directly when a task matches.
+
+## Skills at a glance
+
+| Skill | Purpose | Rough time |
+|---|---|---|
+| [`precommit`](precommit.md) | Run the mandatory pre-commit checks (`spotless:apply`, `license:format`, `checkstyle:check`, `license:check`) plus compiler warning checks (`-Xlint:all`) on only the modules touched by the diff. | 30–120s warm, up to 5min cold |
+| [`run-test `](run-test.md) | Resolve a test class name to its module and run the single-test Maven invocation. Auto-adds integration-test flags. | 30s–15min depending on test |
+| [`quickstart [mode]`](quickstart.md) | Launch a local Pinot quickstart cluster (`batch`, `hybrid`, `streaming`, `upsert-streaming`, `auth`, …) in the background. | ~30s to ready |
+| [`bench-compare [[]`](bench-compare.md) | Run a `pinot-perf` JMH benchmark against a baseline ref and the current tree and diff the JMH tables. Uses a git worktree. | 10min – days (see below) |
+| [`flaky-analyze `](flaky-analyze.md) | Pull recent CI failures for a test class, cluster by stack trace, propose a root-cause hypothesis. Investigation only. | 1–10min per 20 runs scanned |
+
+Review skills (consumed by the [`code-reviewer`](../agents/code-reviewer.md) agent):
+
+| Skill | KB domain |
+|---|---|
+| [`review-config-backcompat`](review-config-backcompat.md) | 1. Configuration & Backward Compatibility |
+| [`review-concurrency-state`](review-concurrency-state.md) | 2. State Management & Concurrency |
+| [`review-architecture`](review-architecture.md) | 3. Code Architecture & Module Design |
+| [`review-performance`](review-performance.md) | 4. Performance & Efficiency |
+| [`review-correctness-nulls`](review-correctness-nulls.md) | 5. Correctness & Safety |
+| [`review-testing`](review-testing.md) | 6. Testing Strategies |
+| [`review-naming-api`](review-naming-api.md) | 7. Naming & API Design |
+| [`review-process-scope`](review-process-scope.md) | 8. Process & Scope |
+
+---
+
+## `precommit`
+
+**What it does.** Detects modified files (staged + unstaged + new untracked `.java` / `.xml` / etc.), maps them to their owning Maven modules by walking up to the nearest `pom.xml`, then runs five checks in order on only those modules: formatting, license headers, style validation, and compiler warnings (via `-Xlint:all`).
+
+**The five checks:**
+- `spotless:apply` — imports only (order + unused removal). Does **not** fix whitespace, indentation, or braces. Pinot's config is deliberately narrow; see the `spotless-maven-plugin` block in the root `pom.xml`.
+- `license:format` — inserts the ASF header into new files that don't have it. Governed by `HEADER` at repo root.
+- `checkstyle:check` — `config/checkstyle.xml`. Top offenders: `LineLength` (120), `AvoidStarImport`, `AvoidStaticImport`, `HideUtilityClassConstructor`, `NeedBraces`.
+- `license:check` — final gate confirming every touched file has a header.
+- `test-compile -Xlint:all` — compiles both `src/main/` and `src/test/` with all compiler warnings enabled (deprecation, unchecked casts, raw types, fallthrough, etc.). Does not use `clean` — incremental compilation still emits warnings for the entire module, and per-line filtering handles pre-existing warnings. Using `clean` would break modules with generated sources (e.g., JavaCC in `pinot-common`). Warnings are filtered to only lines added in the diff (not just by file). Uses `-am` because compilation needs upstream deps.
+
+**Example scenarios:**
+
+- **Clean tree** → prints `No changed Java/XML files — nothing to do.` and exits. Safe to run anytime.
+- **Unused import** → `spotless:check` fails with a coloured diff; `spotless:apply` removes it. *Note:* removal leaves a cosmetic double blank line where the import used to be. Checkstyle does not flag this; the user can clean it up manually if desired.
+- **Missing ASF header on new file** → `license:check` fails with `Some files do not have the expected license header`; `license:format` inserts the ASF header at the top of the file. No manual intervention needed.
+- **Line >120 chars** → `checkstyle:check` fails with `[WARNING] src/.../File.java:[NN] (sizes) LineLength: Line is longer than 120 characters (found NNN).` Skill reports file:line; user must fix manually.
+- **Deprecated API usage** → `test-compile -Xlint:all` emits `[WARNING] File.java:[line,col] method X has been deprecated`. Skill reports the warning and the non-deprecated replacement. Prefer the replacement; suppress with `@SuppressWarnings` only with a justifying comment.
+- **Multi-module diff** → modules are joined with commas into one `-pl ,,...` argument. All goals run once per step, scoped to the union.
+- **Nested plugin module** (e.g. `pinot-plugins/pinot-input-format/pinot-csv/...`) → skill walks up past the `pinot-input-format/pom.xml` aggregator (it has no `src/`) and stops at `pinot-csv/pom.xml`. That's the correct module.
+
+**Usage:**
+- `/precommit` — default, scoped to diff vs. HEAD + untracked.
+- `/precommit staged` — staged changes only.
+- `/precommit branch` — diff vs. `upstream/master` (useful before a PR).
+- `/precommit all` — run on the entire repo; slow (several minutes).
+
+**Known quirks:**
+- Steps 1–4 don't need `-am`. Only step 5 (compile) uses `-am` because javac needs upstream jars on the classpath.
+- Spotless sometimes reformats files you hadn't touched if they were non-compliant to begin with. Review the auto-fix diff before staging.
+- Violations in `pinot-controller/src/main/resources/` (the React UI) are not handled by the Maven plugins — skip that tree.
+- Compiler warnings are filtered to added lines in the diff only — pre-existing warnings, even in files you touched, are not reported.
+
+---
+
+## `run-test`
+
+**What it does.** Given a test class name (or `Class#method`), finds the source file via glob, walks up to the owning module, and builds the correct `./mvnw -pl -am -Dtest=[#] -Dsurefire.failIfNoSpecifiedTests=false test` command.
+
+**Why `-Dsurefire.failIfNoSpecifiedTests=false` is always needed:** with `-am`, Maven builds upstream modules and runs Surefire in each one. Upstream modules don't have the target test, so Surefire's default behaviour (fail when the `-Dtest` filter matches nothing) kills the build at the first upstream module. The flag makes "no tests matched in this module" a no-op and lets the build progress to the module that actually contains the test.
+
+**Integration test heuristics** (used only to warn the user about expected runtime, not to change the command): path contains `pinot-integration-tests`, OR filename ends with `IntegrationTest.java` / `IT.java` / `ClusterTest.java` / `EndToEndTest.java`, OR the module is `pinot-integration-tests` / `pinot-compatibility-verifier`.
+
+**Example scenarios:**
+
+- **Unit test** → `/run-test BigDecimalUtilsTest` → resolves to `pinot-spi/src/test/java/.../BigDecimalUtilsTest.java` → runs `./mvnw -pl pinot-spi -am -Dtest=BigDecimalUtilsTest test`. Verified: 5 tests pass in ~6s after the dependency build.
+- **Integration test** → `/run-test OfflineClusterIntegrationTest` → path `pinot-integration-tests/...` triggers integration detection → adds `-Dsurefire.failIfNoSpecifiedTests=false`. Runs for 10–20 minutes depending on the test.
+- **Method selector** → `/run-test BigDecimalUtilsTest#testRoundTrip` → Maven's `-Dtest=Class#method` form.
+- **Ambiguous name** → `/run-test AggregationFunctionColumnPairTest` → matches two files in `pinot-segment-spi` (`.../misc/` and `.../index/startree/`). Skill lists both with their package paths and asks the user to pick. Does not guess.
+- **Not found** → `/run-test TypoTest` → glob returns zero hits → skill reports and stops.
+- **Abstract base class** → warns that the class has no `@Test` methods and suggests concrete subclasses via grep.
+
+**Known quirks:**
+- First run builds all upstream modules via `-am`, which can be 5–15 minutes on a cold tree. Subsequent runs against the same module skip rebuilds.
+- Pinot uses TestNG; Surefire's `-Dtest=Class#method` syntax still works.
+- Integration tests spin up embedded Helix/ZK/Kafka and bind to localhost ports. Don't run two at once.
+
+---
+
+## `quickstart`
+
+**What it does.** Finds `quick-start-.sh` in `build/bin/` (produced by `-Pbin-dist`) or `pinot-tools/target/pinot-tools-pkg/bin/` (produced by a plain `pinot-tools` build), launches it in the background, and reports the controller URL.
+
+**Available modes:** `batch`, `hybrid`, `streaming`, `upsert-streaming`, `partial-upsert-streaming`, `json-index-batch`, `json-index-streaming`, `complex-type-handling-offline`, `complex-type-handling-streaming`, `auth`, `auth-zk`. Listed from the `pinot-tools` package.
+
+**Example scenarios:**
+
+- **Scripts already built** → runs `quick-start-batch.sh` in the background. After ~30s the controller is up at `http://localhost:9000`. Verified with `curl http://localhost:9000/tables` (returns `{"tables":[...]}`) and an SQL query (`SELECT COUNT(*) FROM airlineStats` → 9746 rows in 33ms).
+- **No build yet** → skill offers two options: full `-Pbin-dist -Pbuild-shaded-jar` (~10min) or `pinot-tools` only (~3min).
+- **Invalid mode** → skill lists the available scripts via `ls`.
+- **Port 9000 already taken** → cluster start fails with `BindException`. Skill reports and asks whether to kill the existing process.
+
+**To stop the cluster:** kill the background shell launched by the skill (the skill records its id), or `pkill -f QuickStart`.
+
+**Known quirks:**
+- All quickstart variants bind the same ports (9000 controller, 8000 broker, 7050/8098/8099 server). Only one can run at a time.
+- The `auth` and `auth-zk` quickstarts use credentials `admin / verysecret`.
+- Quickstarts run with embedded ZK + server + broker + controller in one JVM. That JVM is not small (~4GB heap) — expect to need a decently-sized machine.
+
+---
+
+## `bench-compare`
+
+**What it does.** Runs the same JMH benchmark twice — once against a baseline ref in a temporary git worktree, once against the current tree — and diffs the JMH output tables.
+
+**Time reality.** This is the slowest skill and it is not subtle. Pinot's default JMH config is **8 warmup × 60s + 8 measurement × 60s × 5 forks per `@Benchmark` method per parameter combination**. A single method with a few parameters can be a multi-hour run; the full default on something like `BenchmarkDictionary` estimated *7 days* in one test. The skill refuses to run without either explicit `--args` reducing iteration counts, or the user confirming they want the full default.
+
+**A reasonable first pass** for a single-method benchmark: `--args "-wi 1 -i 2 -f 1 -r 5s -w 5s"`. That's 1 warmup × 5s + 2 measurement × 5s × 1 fork ≈ 20s per method post-setup. Setup itself (the `@Setup` phase, which often builds Pinot segments) can still take 1–10 minutes for benchmarks that construct real tables.
+
+**Invocation style** (from validation: skill always uses this rather than the generated `.sh`):
+```
+java -Xms4G -Xmx8G -cp '/pinot-perf/target/pinot-perf-pkg/lib/*' \
+ org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
+ -wi 1 -i 2 -f 1 -r 5s -w 5s \
+ -jvmArgsAppend=''
+```
+
+Why `org.openjdk.jmh.Main` instead of the per-benchmark `pinot-.sh`:
+- Benchmark classes have a custom `main()` that builds `OptionsBuilder` directly and ignores CLI args. Going through `org.openjdk.jmh.Main` bypasses it so flags like `-wi`, `-i`, `-r`, `-w`, and crucially `-jvmArgsAppend` actually take effect.
+- The generated script hard-codes `-Xms24G -Xmx24G` which OOMs any laptop with less than ~32GB.
+- The `--add-opens` / `--add-exports` flags are mandatory for any benchmark that extends `BaseClusterIntegrationTest` on JDK 21 — without them ZK startup dies with `InaccessibleObjectException`. The canonical set lives in `.github/workflows/pinot_tests.yml`.
+
+**Example scenarios:**
+
+- **Typical run** → `/bench-compare BenchmarkDictionary --args "-wi 1 -i 2 -f 1 -r 5s -w 5s"` — worktree at `/tmp/pinot-bench-baseline`, builds `pinot-perf` there, runs via `org.openjdk.jmh.Main`, diffs the JMH tables.
+- **Cluster-backed benchmark** (e.g. `BenchmarkEquiJoin`, extends `BaseClusterIntegrationTest`) → skill automatically includes the full JDK-21 add-opens set in `-jvmArgsAppend`.
+- **Explicit baseline** → `/bench-compare BenchmarkDictionary release-1.5.0` → same flow against a tagged release.
+- **Vector benchmark** → `/bench-compare BenchmarkVectorIndex` → uses the `exec:java` form from `pinot-perf/README.md`; it has its own CLI conventions.
+
+**Known quirks:**
+
+- **Stale-jar trap (the real failure mode).** If the current tree's `pinot-perf/target/pinot-perf-pkg/lib/` was built from a previous ref that pulled in different versions of a transitive dep (e.g. `zookeeper-3.9.4.jar` + `zookeeper-3.9.5.jar`), the classpath glob loads both and you get `NoSuchMethodError` at runtime. Pinot/Helix often swallows this as `ZkTimeoutException: Unable to connect to zookeeper server within timeout: 1000`, which looks like an infrastructure/timing issue but is actually a classpath bug. The skill defends against this by always using `./mvnw -pl pinot-perf clean package -DskipTests -am` on the current tree (the worktree is a fresh checkout so baseline is immune). **If you see `ZkTimeoutException` in a second run, don't tune timeouts — check `lib/` for version duplicates.**
+- JMH's `-l` flag doesn't help — Pinot benchmark classes have custom `main()` methods that ignore CLI args. There is no fast sanity check; the first real run is also the first verification.
+- The generated `pinot-.sh` scripts hard-code `-Xms24G -Xmx24G`. Avoid them; use the `java -cp 'lib/*'` form with your own `-Xmx`.
+- Only ~21 of ~60 benchmark classes are configured as appassembler programs — not every benchmark has a `.sh`. Direct `java -cp` works for all of them.
+- Worktrees require a clean `.git`. Abort if rebase/merge is in progress.
+- Benchmarks fork their own JVMs; don't be surprised by multiple `java` processes during the run.
+
+---
+
+## `flaky-analyze`
+
+**What it does.** Queries GitHub Actions via `gh` CLI for recent failing runs of `Pinot Tests`, identifies the failing jobs, downloads the logs, greps for **real** Surefire/TestNG failure markers, clusters by stack trace, and reports a hypothesis.
+
+**Investigation only.** This skill never proposes a fix — the goal is to help the user decide whether the failure is a real bug, a race, or infrastructure noise. Principle C6.2 in `kb/code-review-principles.md` is explicit: the fix is never "add retries".
+
+**The right grep patterns** (documented after burning the wrong ones in testing):
+- `\[ERROR\] Tests run: \d+, Failures: [1-9]` — a Surefire class summary with failures. The FQN follows `-- in `.
+- `<<< FAILURE!` / `<<< ERROR!` — marker following a failed assertion. Line above has the method.
+- `##\[error\]Process completed with exit code` — GitHub Actions' process-level marker.
+- `BUILD FAILURE` — Maven's non-zero exit marker.
+
+Do **not** grep for bare `ERROR` / `FAILED` / `Exception` — Pinot's integration tests log those constantly at runtime (Helix rebalancer, Kafka consumer setup, etc.) and you will drown in noise.
+
+**Example scenarios:**
+
+- **Test is genuinely flaky on master** → skill finds multiple failing runs with the same stack trace → hypothesis: likely real race. Suggests specific source file and line, cites C2.x principles.
+- **Test only fails on one matrix combination** (e.g. JDK 21 integration set 1) → likely env-specific → suggests checking JDK-specific behaviour.
+- **Failing job has no Surefire markers** → infrastructure failure (timeout, OOM, runner cancel). Skill classifies as non-test-level and moves on.
+- **Test name never appears** → either the test isn't actually failing (wrong search term) or GitHub's log retention has aged out the runs. Skill reports which and stops.
+
+**Usage:**
+- `/flaky-analyze RangeIndexTest` — last 20 failing runs of `Pinot Tests`.
+- `/flaky-analyze RangeIndexTest 50` — last 50.
+- `/flaky-analyze RangeIndexTest --pr 18267` — only that PR's runs.
+
+**Known quirks:**
+- `gh run view --log-failed` is unreliable here — it returns only the steps marked failed, which for Pinot's "Integration Test" step is often just runner init lines. Always use `--log` + grep.
+- `gh run view --log` can return tens of MB per job. Cap runs scanned at 50 unless the user asks for more; fetch in parallel where possible.
+- GitHub retains Actions logs for 90 days by default. Older flakes require a different data source (e.g. the `surefire-reports-*` artifacts uploaded on master runs).
+
+---
+
+## Related configuration
+
+- [`../agents/code-reviewer.md`](../agents/code-reviewer.md) — independent code-review agent that reads `kb/code-review-principles.md` and dispatches the eight `review-*` skills above.
+- [`../code-review-principles.md`](../code-review-principles.md) — 163 Pinot-specific review principles; the review skills cite these by `C.`.
+- [`../../CLAUDE.md`](../../CLAUDE.md) — project-wide instructions consumed by Claude Code.
+- [`../../AGENTS.md`](../../AGENTS.md) — general agent guidance (cross-tool).
+- [`../../.github/copilot-instructions.md`](../../.github/copilot-instructions.md) — overlapping guidance for GitHub Copilot / Cursor.
+
+## Adding a new skill
+
+1. Create the procedural body at `kb/skills/.md` — this is the source of truth, readable by any agent.
+2. Write the procedure as numbered steps. Prefer concrete commands over prose. Keep it self-contained: a non-Claude agent reading this file should be able to follow it without additional context.
+3. List the skill in the table above and add a detail section below.
+4. (Optional, Claude Code only) Add a thin entry-point at `.claude/skills//SKILL.md` containing YAML frontmatter (`name`, `description`, ASF header inside the frontmatter as `#` comments) and a one-line body pointing back to `kb/skills/.md`. This makes the skill invocable as `/` in Claude Code.
+
+Skills should be narrow, fast to read, and composable. A skill that "runs X and then does a code review" probably belongs as two separate skills chained by the user.
+
+## Troubleshooting
+
+- **Claude Code skill isn't invoked when expected.** Claude may not have loaded `.claude/skills/` in its session context. In a new session, ask Claude to list available skills, or re-type the slash command explicitly.
+- **Maven wrapper not found.** All skills assume the repo root has `./mvnw`. If you're invoking from a subdirectory, ask the agent to `cd` first, or run from the repo root.
+- **`gh` not authenticated.** `flaky-analyze` requires `gh auth status` to succeed against github.com. Run `gh auth login` once.
+- **Worktree errors in `bench-compare`.** Check that `/tmp/pinot-bench-baseline` isn't a leftover from a previous aborted run — if so, `git worktree remove --force` it and retry.
diff --git a/kb/skills/bench-compare.md b/kb/skills/bench-compare.md
new file mode 100644
index 000000000000..1f28c0e53593
--- /dev/null
+++ b/kb/skills/bench-compare.md
@@ -0,0 +1,86 @@
+# bench-compare
+
+Purpose: when a change claims a performance impact (principle C6.7 — "performance-sensitive changes require benchmark comparisons"), produce the before/after numbers in one command, without making the user manually stash, checkout, run, un-stash, and re-run.
+
+Usage:
+- `/bench-compare BenchmarkDictionary` — compares current working tree vs. `merge-base HEAD upstream/master` (falls back to `origin/master` if upstream missing).
+- `/bench-compare BenchmarkDictionary ` — compare against an explicit ref (commit, tag, branch).
+- `/bench-compare BenchmarkDictionary --args "-wi 1 -i 2 -f 1 -r 5s -w 5s"` — pass extra JMH args. **Always use short warmup/iteration flags for a first pass**; defaults run for hours or days.
+
+**Time expectations.** Pinot benchmarks are not quick. Default JMH config in `pinot-perf` is 8 warmup × 60s + 8 measurement × 60s × 5 forks per parameter combination — a single benchmark method's `@Benchmark` can report an ETA of multiple days. The skill will refuse to run without either: (a) explicit `--args` that reduce warmup/iteration counts, or (b) the user confirming they really do want the full default run.
+
+## Procedure
+
+1. **Locate the benchmark.** Glob for `pinot-perf/**/.java`. If zero or multiple matches, report and stop. The benchmark class must be under `pinot-perf`.
+
+2. **Resolve the baseline ref.**
+ - Default: `git merge-base HEAD upstream/master`. If the `upstream` remote isn't defined, fall back to `origin/master`. If neither resolves, ask the user for an explicit ref.
+ - If the user passed a ref, validate it with `git rev-parse --verify ][`.
+
+3. **Prepare output directory.** `mkdir -p .bench-compare/` and append it to the repo's `.gitignore` if not already there. Produce two files: `baseline-.txt` and `current-.txt`.
+
+4. **Warn and confirm.** Benchmarks take real time. Inspect `--args` — if the user hasn't passed iteration controls, warn that the default suite can take hours to days and suggest a starter like `-wi 1 -i 2 -f 1 -r 5s -w 5s`. Print an estimate of the pair of runs (rough: a 5s-warmup × 5s-measurement × 1 fork run takes ~30–120s per `@Benchmark` method after the Pinot-side `@Setup` completes; `@Setup` alone can run for 1–10 minutes for benchmarks that build segments). Ask the user to confirm.
+
+5. **Build pinot-perf in a baseline worktree.** This avoids touching the working tree:
+ ```
+ git worktree add /tmp/pinot-bench-baseline
+ (cd /tmp/pinot-bench-baseline && ./mvnw -pl pinot-perf -am package -DskipTests)
+ ```
+ The package goal produces the jars, an appassembler-generated launcher (for ~21 blessed benchmark classes) at `pinot-perf/target/pinot-perf-pkg/bin/pinot-.sh`, and a fat `lib/` directory.
+
+6. **Run the baseline benchmark.** Two invocation styles, in order of preference:
+
+ **Preferred — always use JMH's own Main class:**
+ ```
+ java -Xms4G -Xmx8G -cp '/tmp/pinot-bench-baseline/pinot-perf/target/pinot-perf-pkg/lib/*' \
+ org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
+ -wi 1 -i 2 -f 1 -r 5s -w 5s \
+ -jvmArgsAppend='-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
+ > .bench-compare/baseline-.txt 2>&1
+ ```
+
+ Why not use the generated `pinot-.sh`?
+ - It hard-codes `-Xms24G -Xmx24G` — OOMs on <32GB machines.
+ - The benchmark's own `main()` (which the script invokes) typically constructs `OptionsBuilder` directly and **ignores CLI args**, so you can't override warmup/iterations or pass `-jvmArgsAppend`. Going through `org.openjdk.jmh.Main` bypasses the custom main and gets you JMH's standard CLI.
+ - The `--add-opens`/`--add-exports` flags are mandatory for any benchmark that extends `BaseClusterIntegrationTest` (i.e., spins up a Pinot cluster) on JDK 21 — without them, ZK startup fails with `InaccessibleObjectException` wrapped as `ExceptionInInitializerError`.
+
+ For the vector suite (`BenchmarkVectorIndex`) use the `exec:java` form from `pinot-perf/README.md`; it has its own quirks.
+
+7. **Build and run the current tree.** Two mandatory gotchas:
+
+ - **Always clean first:** `./mvnw -pl pinot-perf clean package -DskipTests` (note the `clean`, no `-am` — see next bullet). If `pinot-perf/target/pinot-perf-pkg/` already exists from a prior build on a different branch/ref, incremental `package` leaves stale third-party jars in `lib/` when a dependency version changes upstream. Those stale jars sit on the classpath alongside the new ones (e.g. `zookeeper-3.9.4.jar` and `zookeeper-3.9.5.jar`) and cause `NoSuchMethodError` at runtime. Crucially, Helix/Pinot swallows the resulting `ExceptionInInitializerError` in ZK startup and surfaces a misleading `ZkTimeoutException: timeout: 1000` instead — which looks for all the world like a flaky port or timing issue. If you see that exception, **check `lib/` for duplicate versions of `zookeeper-*`, `helix-*`, `netty-*`, etc. first.**
+
+ - **Use `-am` only on the first build.** After the first clean+package, upstream modules are populated; subsequent builds can skip `-am`. The worktree build in step 5 gets a fresh `target/` so doesn't have this problem.
+
+ Invocation is identical to step 6, just against the current tree's `lib/*`:
+ ```
+ ./mvnw -pl pinot-perf clean package -DskipTests -am
+ java -Xms4G -Xmx8G -cp 'pinot-perf/target/pinot-perf-pkg/lib/*' \
+ org.openjdk.jmh.Main 'org.apache.pinot.perf.' \
+ > .bench-compare/current-.txt 2>&1
+ ```
+
+8. **Clean up the worktree.** `git worktree remove /tmp/pinot-bench-baseline --force`. Do this even if steps 6 or 7 failed.
+
+9. **Diff the results.** Parse JMH's table output (the `Benchmark ... Score Error Units` lines) from both files. Produce a table:
+ ```
+ Benchmark Baseline (ops/s) Current (ops/s) Δ Δ%
+ foo.methodA 1234.5 ± 12.1 1478.2 ± 15.3 +243.7 +19.7%
+ foo.methodB 987.6 ± 8.0 992.1 ± 7.2 +4.5 +0.4%
+ ```
+ Use "ops/s", "ns/op", or whatever unit JMH emits — don't convert.
+
+10. **Report.** Print the table. Flag any benchmark where `|Δ%| > 2×error%` as likely a real change (otherwise probably noise). Include the paths to the raw files so the user can share them in a PR.
+
+## Notes
+
+- **Stale-jar trap is the #1 source of mysterious failures.** Pinot benchmarks that fail on a subsequent run of `/bench-compare` in the same repo almost always fail because of duplicate third-party jars in `pinot-perf/target/pinot-perf-pkg/lib/` — typically `zookeeper-X.jar` + `zookeeper-Y.jar` (or equivalent for helix, netty, guava) from different builds. The failure shows up as a deeply-wrapped `ZkTimeoutException: Unable to connect to zookeeper server within timeout: 1000` (or similar NoSuchMethodError swallowed into an infrastructure-looking error). First diagnostic when a second run fails: `ls pinot-perf/target/pinot-perf-pkg/lib/ | sort | awk -F- '{v=$NF; sub("\\.jar$","",v); k=$0; sub("-"v"\\.jar$","",k); print k}' | sort | uniq -d` to spot duplicates. The fix is always `./mvnw -pl pinot-perf clean package -DskipTests -am`, never `rm` individual jars.
+- Worktrees require a clean `.git`. If the repo is in the middle of a rebase/merge, abort with a clear message.
+- **JDK 21 needs the full `--add-opens` / `--add-exports` flag set** for any cluster-backed benchmark (extends `BaseClusterIntegrationTest`). Without them, ZK startup fails with `InaccessibleObjectException: ... module java.base does not "opens java.lang"`. Pass via `-jvmArgsAppend=...` to `org.openjdk.jmh.Main`; CI's `pinot_tests.yml` has the canonical list.
+- **The generated `pinot-.sh` scripts hard-code `-Xms24G -Xmx24G`.** Avoid them — use `java -cp 'lib/*' org.openjdk.jmh.Main ` directly with your own `-Xmx`.
+- **Not every benchmark has a generated script.** The appassembler programs list in `pinot-perf/pom.xml` covers ~21 of ~60 benchmark classes. The direct `java -cp` invocation works for any of them.
+- JMH's `-l` (list benchmarks) flag **does not help here** — Pinot benchmark classes have custom `main()` entry points that construct `OptionsBuilder` directly, ignore CLI args, and plunge straight into `Runner.run(...)` which in turn kicks off `@Setup`. For `BenchmarkDictionary` this `@Setup` alone burns 5+ minutes building dictionaries. There is no fast sanity-check short of actually running the benchmark through `org.openjdk.jmh.Main` (which at least accepts `-wi 1 -i 1 -r 1s -w 1s` to minimise it).
+- Benchmarks must run on the same hardware, same JDK, same OS load. Warn the user if they're on battery power or running other heavy processes.
+- Do not `sleep` between runs for "timing" reasons. If a second run fails, it is the stale-jar issue (above), not TIME_WAIT. I spent a long time chasing the timing hypothesis before spotting the classpath mismatch.
+- If the benchmark's output format isn't plain JMH (e.g. `BenchmarkVectorIndex` writes a custom report), don't try to parse it — just save both outputs and tell the user where they are, with a note that manual comparison is needed.
+- Never use `git stash` instead of a worktree. Stash can be lost if the second build fails and the user doesn't know to pop it.
diff --git a/kb/skills/flaky-analyze.md b/kb/skills/flaky-analyze.md
new file mode 100644
index 000000000000..ddc5213617fe
--- /dev/null
+++ b/kb/skills/flaky-analyze.md
@@ -0,0 +1,96 @@
+# flaky-analyze
+
+Purpose: when a test is failing intermittently on CI, gather the evidence in one place so the user can decide whether it's a real race, an environmental issue, or a legitimate regression. Principle C6.2 is explicit: the fix is never "add retries" — it's understanding the root cause.
+
+Usage:
+- `/flaky-analyze RangeIndexTest` — last ~20 runs on master + PRs.
+- `/flaky-analyze RangeIndexTest 50` — look at 50 runs.
+- `/flaky-analyze RangeIndexTest --pr 18267` — only that PR's runs.
+
+## Prerequisites
+
+1. `gh` CLI installed and authenticated: `gh auth status` must succeed.
+2. Network access to GitHub.
+3. The test must be in `apache/pinot`. If the user's remote is a fork, still query `apache/pinot` — that's where CI lives.
+
+If `gh` isn't available, report that and exit. Don't try to fall back to `curl` with tokens.
+
+## Procedure
+
+1. **Parse the argument.** Extract class name (required), optional run count (default 20), optional PR filter.
+
+2. **Find relevant workflow runs.**
+ ```
+ gh run list --repo apache/pinot --workflow "Pinot Tests" --status failure --limit --json databaseId,displayTitle,headBranch,headSha,createdAt,url
+ ```
+ If `--pr` is set, filter with `--branch` to the PR's head branch, or use `gh pr view --json headRefName`.
+
+3. **For each failed run, find the failing jobs.** A "Pinot Tests" run has matrix jobs (test sets 1/2 × java 21 × unit/integration). Only some fail.
+ ```
+ gh run view --repo apache/pinot --json jobs
+ ```
+ Filter to jobs with `conclusion: "failure"`.
+
+4. **Download and grep the logs for the target test.** For each failing job:
+ ```
+ gh run view --job --repo apache/pinot --log
+ ```
+ Each log line is prefixed by `\tUNKNOWN STEP\t`. The log is large (tens of MB); pipe straight into ripgrep with these patterns — they are what Surefire/TestNG/GitHub Actions actually emit:
+
+ - `\[ERROR\] Tests run: \d+, Failures: [1-9]` — the Surefire class-summary line when a test class had failures. The **fully qualified class name** is on the same line after `-- in `.
+ - `\[ERROR\] Tests run: \d+, Failures: \d+, Errors: [1-9]` — same, with errors instead of failures.
+ - `<<< FAILURE!` / `<<< ERROR!` — the Surefire marker following a failed assertion. Useful to find the exact method; the method and class appear in the line immediately above.
+ - `##\[error\]Process completed with exit code` — GitHub Actions' own marker, always present when the job fails for a process-level reason.
+ - `BUILD FAILURE` — Maven-level, always present when Maven returns non-zero.
+
+ **Do not grep for raw `ERROR` / `FAILED` / `Exception`** — Pinot's integration tests log these constantly at runtime (Helix rebalancer, consumer setup, etc.) and you'll drown in noise. The patterns above only match actual failure markers.
+
+ If none of those patterns match in a failing job's log, the test didn't fail at the test level — the job died for infrastructure reasons (timeout, OOM, runner cancel). Classify it as "infrastructure failure" and move on.
+
+5. **Extract structured failure records.** For each hit, record:
+ - Run id, PR number (if any), commit SHA, JDK version, test set (parseable from the job name like `Pinot Integration Test Set 1 (temurin-21)`).
+ - The failing class FQN from the `-- in ` suffix of the summary line.
+ - The failure message (typically the line containing `<<< FAILURE!` or the `AssertionError: ...` line that follows).
+ - The top ~5 frames of the stack trace, taken from the ~30 lines following the `<<< FAILURE!` marker.
+ - Any preceding log lines that look like test setup problems (timeouts, `Connection refused`, `Address already in use`, `OutOfMemoryError`).
+
+6. **Cluster the failures.** Group by:
+ - Identical exception type + top-of-stack frame → likely same root cause.
+ - Different stack traces → either multiple bugs or environmental flakiness.
+ - Setup/timeout errors with no test code in the stack → likely infrastructure.
+
+7. **Report.** Structure:
+ ```
+ ## Flaky analysis:
+ Runs scanned: N (M with this test failing, K with unrelated failures)
+
+ ### Failure cluster 1 — at ( occurrences)
+ Example (PR #, JDK 21, test set 2):
+
+ Commits affected: ]
+
+ ### Failure cluster 2 — ...
+
+ ### Hypothesis
+
+
+
+ ### Suggested next steps
+ -
+ ```
+
+8. **Do not propose a fix.** This skill is investigation, not remediation. End with "Want me to open the source file at the top-of-stack frame?"
+
+## Notes
+
+- `gh run view --log` can be slow (10–60s per run) and returns large payloads. Cap total runs scanned at 50 unless the user asks for more. Run these fetches in parallel where possible; `gh` is rate-limited but stays under the limit for <50 runs.
+- Don't write the raw logs to the repo. Stream them through grep and keep only the extracted records in memory.
+- `gh run view --log-failed` is not reliable here — it only returns the steps GitHub marked failed, which for Pinot's "Integration Test" step often just contains runner init lines before the actual Maven invocation. Always use `--log` + the patterns above.
+- To find failing *jobs* within a run without downloading its full log, use:
+ ```
+ gh api repos/apache/pinot/actions/runs//jobs --jq '.jobs[] | select(.conclusion=="failure") | {name, id: .databaseId}'
+ ```
+ Then pass the `id` as `--job `. Avoids pulling all matrix logs.
+- If the test doesn't appear in any failure log, report that directly: either the test isn't actually flaky on CI, or the search term is wrong.
+- Results depend on log retention (GitHub keeps 90 days by default). Older flakes are invisible here — suggest checking the `surefire-reports-*` artifacts on master for long-term trends.
+- The `Pinot Tests` workflow (file: `pinot_tests.yml`) is the right default. Also worth checking `Pinot Compatibility Regression Testing` and `Pinot Multi-Stage Query Engine Compatibility Regression Testing` workflows for integration tests that only run there — ask the user before querying those since they multiply the API calls.
diff --git a/kb/skills/precommit.md b/kb/skills/precommit.md
new file mode 100644
index 000000000000..6693122d4cbd
--- /dev/null
+++ b/kb/skills/precommit.md
@@ -0,0 +1,109 @@
+# precommit
+
+Purpose: before pushing a commit or opening a PR, run all quality checks on the modules the current diff actually touches. Don't run them on the whole repo — that's slow and wasteful on a tree this size.
+
+The five checks (in order):
+1. `./mvnw spotless:apply -pl ` — auto-formats code.
+2. `./mvnw license:format -pl ` — adds ASF headers to any new files.
+3. `./mvnw checkstyle:check -pl ` — validates style; fails hard.
+4. `./mvnw license:check -pl ` — validates headers; fails hard.
+5. `./mvnw test-compile -pl -am -Dmaven.compiler.showDeprecation=true -Dmaven.compiler.showWarnings=true '-Dmaven.compiler.compilerArgs=-Xlint:all'` — compiles and checks for deprecation, unchecked casts, raw types, fallthrough, etc. Warnings are filtered to only lines added in the diff.
+
+Steps 1 and 2 are auto-fixers. Steps 3 and 4 are validators — if they fail after the auto-fixers ran, report the failure with the exact offending file/line from the Maven output and stop. Do not try to manually patch style errors; fix the underlying issue or ask the user. Step 5 is a compiler check — if it produces warnings on newly added lines, report them. Prefer the non-deprecated replacement; suppress with `@SuppressWarnings` only with a comment explaining why the deprecated reference is required (e.g., backward-compat serialization, mixed-version SPI calls, testing the deprecated path).
+
+## Procedure
+
+1. **Find changed files.**
+ - If the user passed an argument (`staged`, `unstaged`, `branch`, or a path), use that as the scope.
+ - Default: union of staged + unstaged files vs. HEAD, plus any added-but-untracked `.java` / `.xml` / `.properties` files.
+ - Ignore: `target/`, `node_modules/`, generated sources, `**/*.md`, anything under `pinot-controller/src/main/resources/` (UI) unless the user explicitly asks — those aren't covered by the Maven plugins.
+
+2. **Map files to modules.** For each changed file, walk up the directory tree until a `pom.xml` is found. The first directory containing a `pom.xml` that is *not* the repo root is the module. De-duplicate.
+ - If the only pom is the repo root, the user is touching top-level config — just run the checks at the root (no `-pl`).
+ - Some plugin modules are nested two levels deep (e.g. `pinot-plugins/pinot-input-format/pinot-parquet`). Don't stop at an intermediate aggregator pom if it doesn't define the actual sources — walk up until you find the module that directly contains the changed file.
+
+3. **Report the plan.** Print the list of detected modules in one line: `Modules: pinot-broker, pinot-common, pinot-plugins/pinot-input-format/pinot-parquet`. If there are no modules, say "No changed Java/XML files — nothing to do." and exit.
+
+4. **Run the auto-fixers.** Build a single `-pl` argument with comma-separated modules:
+ ```
+ ./mvnw spotless:apply -pl
+ ./mvnw license:format -pl
+ ```
+ Run each in the foreground. Track the number of files modified by each. If either fails with a non-build error (not a style error — those go through checkstyle), stop and surface the error.
+
+5. **Run the validators.**
+ ```
+ ./mvnw checkstyle:check -pl
+ ./mvnw license:check -pl
+ ```
+ If either fails, parse the Maven output, extract the file:line of each violation, and track them for the summary. Do not attempt to auto-fix checkstyle violations — they need human judgment.
+
+6. **Build the added-line set for compiler warning filtering.** This runs after the auto-fixers so that line numbers reflect the post-fix state (spotless may remove imports, shifting line numbers). Run `git diff --unified=0 HEAD -- ` and parse the `@@` hunk headers to extract the added line ranges. Build a map of `file → set of added line numbers`. For untracked `.java` files (new files not yet in git), `git diff` returns nothing — treat all lines as added (use `wc -l` to get the line count and add 1 through N to the set).
+
+7. **Run the compiler check.**
+ ```
+ ./mvnw test-compile -pl -am -Dmaven.compiler.showDeprecation=true -Dmaven.compiler.showWarnings=true '-Dmaven.compiler.compilerArgs=-Xlint:all'
+ ```
+ This is the only step that uses `-am` — compilation needs upstream dependencies built, unlike the other steps. Do not use `clean` — incremental compilation still emits warnings for all files in the module, and the per-line filter (step 6) handles pre-existing warnings. Using `clean` would break modules with generated sources (e.g., JavaCC in `pinot-common`) and adds significant overhead on deep modules. If warnings seem missing (e.g., stale `target/` from a different branch), the user can manually run `./mvnw clean generate-sources test-compile -pl -am ...` to force full recompilation.
+
+ Parse the output for `[WARNING]` lines. **Filter to only added lines from the diff** — for each warning of the form `[WARNING] /path/File.java:[line,col] `, check whether that file and line number appear in the added-line set from step 6. Only report warnings that match. This avoids surfacing pre-existing warnings when a contributor edits a file that already has them.
+
+ Track each matching warning with file:line and category (deprecation, unchecked, rawtypes, etc.).
+
+ For deprecation warnings: prefer the non-deprecated replacement API. If removing the deprecated reference is not feasible (e.g., backward-compat serialization, mixed-version SPI calls, testing the deprecated path), suppress with `@SuppressWarnings("deprecation")` and a comment explaining why.
+
+8. **Print summary report.** Always print the full report, even when all checks pass:
+
+ ```
+ ## Pre-commit Summary — modules
+
+ | Check | Status | Details |
+ |------------------|--------|--------------------------------|
+ | spotless:apply | FIXED | 3 files reformatted |
+ | license:format | OK | 0 files needed headers |
+ | checkstyle:check | PASS | |
+ | license:check | PASS | |
+ | test-compile -Xlint | FAIL | 2 warnings on new lines |
+
+ ### Auto-fixed (review before staging)
+ - spotless reformatted: File1.java, File2.java
+
+ ### Not fixed (requires manual action)
+ - `SomeClass.java:45` — [deprecation] Foo.bar() is deprecated, use Foo.baz()
+ - `OtherClass.java:12` — [unchecked] unchecked cast to List
+ ```
+
+ Status values:
+ - **FIXED** — auto-fixer modified files (spotless, license:format)
+ - **OK** — auto-fixer ran but nothing needed fixing
+ - **PASS** — validator passed with no violations
+ - **FAIL** — validator or compiler found issues
+
+ The report must include:
+ - The full table for all 5 checks, every time
+ - Every unfixed issue with file:line and what to do about it
+ - Every auto-fixed file so the user can review before staging
+ - For deprecation: the deprecated API and its replacement (if known)
+
+ Do not stage auto-fixed files; that's the user's choice.
+
+## What each step actually enforces
+
+Knowing this matters for diagnosing failures:
+
+- **`spotless:check/apply`**: Pinot's spotless config (see root `pom.xml`) enforces **only two things** — import order (`,\#` → non-static then static) and removal of unused imports. It does **not** enforce trailing whitespace, indentation, brace style, or line length. Don't promise the user that spotless will fix arbitrary formatting.
+- **`license:check/format`**: the ASF header from `HEADER` (repo root), applied to `.java`, `.xml`, `.js`, `.sh`, `.md`, etc. Many file types are excluded — see the `licenseSets/excludes` block in the parent `pom.xml`.
+- **`checkstyle:check`**: rules from `config/checkstyle.xml`. The common ones contributors trip: `LineLength` (120 chars), `AvoidStarImport`, `AvoidStaticImport`, `HideUtilityClassConstructor`, `NeedBraces`. Output format is `[WARNING] :[] () : ` — parse that when surfacing violations.
+- **`license:check`** runs after `license:format` to confirm every touched file now has the header, including files the user only renamed (the plugin keys off content, not git status).
+- **`test-compile -Xlint:all`**: uses `test-compile` (not just `compile`) so both `src/main/` and `src/test/` sources are compiled and all warnings are emitted. Do not use `clean` — it wipes generated sources (e.g., JavaCC in `pinot-common`) and adds significant overhead on deep modules. Incremental compilation still emits warnings for the entire module; the per-line filter handles pre-existing warnings. The Java compiler flags any reference to `@Deprecated` classes/methods/fields from any dependency (Pinot internal or third-party jars), plus unchecked casts, raw types, fallthrough in switch, and other warning categories. Output format: `[WARNING] /path/File.java:[line,col] `. Warnings are filtered to added lines in the diff only — not just by file, but by the specific line numbers from `git diff --unified=0`.
+
+## Notes
+
+- Always use `./mvnw`, never a system `mvn`. The repo's CLAUDE.md is explicit on this.
+- Don't pass `-am` for steps 1–5 — that builds upstream dependencies too, which defeats the purpose of scoping. Only step 7 (compile) needs `-am` because javac needs dependency jars on the classpath.
+- Run sequentially, not in parallel. Spotless and license:format may both modify the same files; ordering matters.
+- When `spotless:apply` removes an unused import, it leaves a *leftover blank line* where the import used to be. This is harmless (checkstyle does not flag it), but if the user cares about the cosmetic double-blank, they'll need to hand-clean after the skill runs. Mention this in the report if spotless touched any files.
+- If the user says `/precommit all`, run on the whole repo (no `-pl`). Warn that this is slow (several minutes).
+- Long builds: steps 1–5 are fast (<30s warm). Step 7 (test-compile with `-am`) is slower (~7–90s depending on module depth and Maven cache state). Deep modules with many upstream deps may take longer on a cold cache. Use `run_in_background` only if the user explicitly asks — otherwise show progress inline.
+- The `license:check` and `checkstyle:check` goals return Maven exit code `1` on violations. If you're capturing the output with shell chaining like `... | tail`, the *tail* pipeline's exit code will mask Maven's — always record Maven's exit code separately, e.g. with `set -o pipefail` or by capturing `${PIPESTATUS[0]}`.
+- The compiler warning filter (step 7) uses the added-line set from step 6, not just the changed-file list. This is critical — per-file filtering would surface pre-existing warnings in files the contributor merely edited, which is unfair. Per-line filtering ensures only warnings on newly added code are reported.
diff --git a/kb/skills/quickstart.md b/kb/skills/quickstart.md
new file mode 100644
index 000000000000..729a2ce2c212
--- /dev/null
+++ b/kb/skills/quickstart.md
@@ -0,0 +1,53 @@
+# quickstart
+
+Purpose: get a local Pinot cluster up for manual testing or debugging, without the user having to remember where the scripts live or which build profile produces them.
+
+Usage:
+- `/quickstart` — default, runs `quick-start-batch.sh`.
+- `/quickstart batch` — batch mode (offline table with sample data).
+- `/quickstart hybrid` — hybrid table (offline + realtime).
+- `/quickstart streaming` — realtime consumption from an embedded Kafka.
+- `/quickstart upsert-streaming` — upsert table on Kafka.
+- `/quickstart partial-upsert-streaming` — partial-upsert table on Kafka.
+- `/quickstart json-index-batch` / `json-index-streaming` — JSON index demos.
+- `/quickstart complex-type-handling-offline` / `complex-type-handling-streaming` — complex type demos.
+- `/quickstart auth` / `auth-zk` — auth-enabled variants.
+
+## Procedure
+
+1. **Find the quickstart script.** Look in order:
+ - `build/bin/quick-start-.sh` (produced by `-Pbin-dist`)
+ - `pinot-tools/target/pinot-tools-pkg/bin/quick-start-.sh` (produced by a plain `./mvnw package` of `pinot-tools`)
+
+ If neither exists, the binary distribution hasn't been built.
+
+2. **If the script is missing, offer to build.** Ask the user:
+ > Quickstart scripts not found. Build them now?
+ > - Full bin-dist (recommended for first time): `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` (~10 min)
+ > - Just pinot-tools: `./mvnw -pl pinot-tools -am package -DskipTests` (~3 min)
+
+ Run the one they pick in the foreground. If they're re-running after a prior build, `build/bin/` should already exist.
+
+3. **Validate the mode.** If the user passed an unrecognized mode, list the available scripts:
+ ```
+ ls build/bin/quick-start-*.sh 2>/dev/null || ls pinot-tools/target/pinot-tools-pkg/bin/quick-start-*.sh
+ ```
+
+4. **Run the script in the background.** Quickstart processes run indefinitely; they're servers. Use `run_in_background` so the user can keep working:
+ ```
+ build/bin/quick-start-.sh
+ ```
+ Capture the shell id so the user can check output later.
+
+5. **Report how to use it.** Once started (wait ~30s or until logs show "Pinot Controller started"), tell the user:
+ - Controller UI: http://localhost:9000
+ - Query console: http://localhost:9000/#/query
+ - To stop: kill the background shell (provide the id).
+ - Logs: printed to the background shell's stdout.
+
+## Notes
+
+- Quickstart processes are long-running. Do not poll with `sleep` loops. The user can check status via the HTTP UI.
+- Multiple quickstarts can't run simultaneously — they all bind to the same ports (9000, 8000, 7050, 8098, 8099). If a run fails with "Address already in use", check for an existing quickstart and ask the user before killing it.
+- The auth quickstart uses a default admin/verysecret credential; mention this if the user picks `auth` or `auth-zk`.
+- On macOS with JDK 21+, quickstarts may need extra `--add-opens` flags; the shipped scripts handle this already, so don't add flags unless the user reports a module-access error.
diff --git a/kb/skills/review-architecture.md b/kb/skills/review-architecture.md
new file mode 100644
index 000000000000..58b7d6031519
--- /dev/null
+++ b/kb/skills/review-architecture.md
@@ -0,0 +1,34 @@
+# review-architecture
+
+You are a specialized reviewer for **Apache Pinot domain 3: Code Architecture & Module Design**. Read `kb/code-review-principles.md` section 3 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — circular dependency across modules; public SPI added without considering backward compat; broker code reaching into server-only internals.
+- **MAJOR** — missing abstract base where >1 implementation exists; duplicated utility logic instead of extending a shared one; plugin pulls non-plugin deps into its module.
+- **MINOR** — package misnamed for its role; helper utility could live one module higher.
+
+## 1. Broad scan
+
+- Moved / renamed classes (check `git diff --find-renames`).
+- New interfaces or abstract classes.
+- New POM `` entries; check the module and the scope.
+- Imports that cross module roots (`pinot-broker` importing `org.apache.pinot.core.query.executor.ServerQueryExecutor`, or `pinot-common` importing `pinot-core`).
+- Two or more implementations of the same SPI: check for a missing abstract base.
+- Any file moved into or out of `pinot-spi/` (binary contract surface).
+
+## 2. Deep analysis
+
+- **C3.x** Confirm module layering: `pinot-spi` → `pinot-common` → `pinot-segment-spi` → `pinot-segment-local` → `pinot-core` → `pinot-query-runtime` / `pinot-broker` / `pinot-server` / `pinot-controller`. Edges must flow one-way.
+- Prefer abstract base over interface-with-default when >1 impl exists and logic is shared (see the AbstractResponseStore pattern).
+- Plugin modules under `pinot-plugins/` must not be depended-on by core code. Verify the direction.
+- New REST resources belong in `pinot-controller` or `pinot-broker`, never cross-wired.
+- Utility classes: if a helper mirrors existing functionality (e.g., a second `PartitionIdUtils`), flag duplication and recommend consolidation.
+- Shaded-jar impacts: flag if a new transitive dep clashes with an existing shaded package.
+
+## 3. Findings
+
+Tag `skill: review-architecture`, cite `C3.x`, use `[BUG-ARCH]` for unnamed structural issues.
+
+## When to defer to the developer
+
+- Intentional one-off utility in a module that explicitly doesn't want a cross-module shared helper; ensure the PR description says so.
diff --git a/kb/skills/review-concurrency-state.md b/kb/skills/review-concurrency-state.md
new file mode 100644
index 000000000000..8597d7143566
--- /dev/null
+++ b/kb/skills/review-concurrency-state.md
@@ -0,0 +1,42 @@
+# review-concurrency-state
+
+You are a specialized reviewer for **Apache Pinot domain 2: State Management & Concurrency**. Read `kb/code-review-principles.md` section 2 and `CLAUDE.md` before analyzing.
+
+Severity:
+- **CRITICAL** — data race, atomicity violation (wipe-before-install), IdealState write without version check, visibility bug on shared mutable state.
+- **MAJOR** — unnecessary lock widening, striped lock without measured contention, check-then-act race even if rare.
+- **MINOR** — over-synchronization, missing `volatile` where `final` would be safer, comment omission on thread-safety contract.
+
+## 1. Broad scan
+
+- Added/removed `synchronized`, `volatile`, `AtomicReference`, `AtomicLong`, `ReentrantLock`, `StampedLock`, `ConcurrentHashMap`, `CopyOnWriteArrayList`.
+- `get` followed by `put` / `remove` on concurrent maps (check-then-act pattern).
+- Helix `IdealState` / `ExternalView` reads without version checks, or `setIdealState` without `dataAccessor.getProperty(...).getStat()`.
+- `@GuardedBy` annotations added or removed.
+- Registration of observers / listeners (callbacks, MetricsRegistry, segment lifecycle listeners) without clear lifetime documentation.
+- Background threads: `Executors.new*`, `ScheduledExecutorService`, `Thread`. Check shutdown path (`awaitTermination` then `shutdownNow`).
+- Consumer / upsert files: `PartitionConsumer`, `UpsertMetadataManager`, `*PartitionUpsertMetadataManager`.
+
+## 2. Deep analysis
+
+For each hit, apply the KB's concurrency principles:
+
+- **C2.1 — Atomic transitions.** Never wipe old metadata before the new state is durably installed. Pattern: prepare-new → swap-reference → cleanup-old. Flag eager deletes.
+- **C2.2 — Thread-safety conservatism.** Default to explicit synchronization. If replacing `synchronized` with `AtomicReference` or `CHM.compute`, verify the visibility story holds across all callers.
+- **C2.3 — Race analysis for lock changes.** When a lock's scope is narrowed or removed, walk through interleavings with other threads that touch the same state. Flag if the walk-through isn't in the PR description.
+- **C2.4 — Version-checked writes.** Shared state in ZK (IdealState, IdealStateConfig, TableConfig, Schema) must be written with optimistic locking (ZK node version); reject blind writes.
+- **C2.5 — Check-then-act on atomics is still racy.** `if (!map.containsKey(k)) map.put(k, v)` is a bug — must be `putIfAbsent` / `computeIfAbsent`.
+- **C2.6 — Shared observers.** When an observer is registered from multiple paths or called concurrently, the handler must be idempotent and its mutable state must be published safely.
+
+Also check lifecycle: every `new ExecutorService` needs a clear shutdown path in `close()` / stop hook.
+
+## 3. Findings
+
+Emit findings in the `code-reviewer` agent format, tagging `skill: review-concurrency-state` and citing `C2.x`. Use `[BUG-RACE]` for unnamed bugs.
+
+For each finding, include a short interleaving sketch (T1/T2 steps) where the race is non-obvious — this is high-leverage and human reviewers trust it.
+
+## When to defer to the developer
+
+- Code clearly documents a single-threaded invariant (e.g., called only from the Helix event thread) and the invariant is preserved.
+- The change is in test-only code.
diff --git a/kb/skills/review-config-backcompat.md b/kb/skills/review-config-backcompat.md
new file mode 100644
index 000000000000..1730eb9fe574
--- /dev/null
+++ b/kb/skills/review-config-backcompat.md
@@ -0,0 +1,60 @@
+# review-config-backcompat
+
+You are a specialized reviewer for **Apache Pinot domain 1: Configuration & Backward Compatibility**. Read `kb/code-review-principles.md` (section "1. Configuration & Backward Compatibility") and `CLAUDE.md` before analyzing.
+
+Severity (from the KB):
+- **CRITICAL** — must fix: removed/renamed config key with no legacy fallback; widened SPI signature; renamed enum/DataType/schema type; Protobuf field-number reuse; DataTable/segment version bump without dual-read.
+- **MAJOR** — should fix: new feature ships ON by default; multi-level override not validated; missing `@Deprecated` on legacy alias.
+- **MINOR** — quality: config namespace inconsistent; constant name mismatches string value; comment misses the rollout plan.
+
+## 1. Broad scan
+
+Grep the diff for risky patterns:
+
+- Removed or renamed constants under `pinot-spi/` and any `*Constants.java`.
+- Method signature changes in files under `pinot-spi/**` or `pinot-segment-spi/**`.
+- New or changed values in any `enum` whose class is serialized (check for `@JsonCreator`, `@JsonValue`, or presence in Helix/ZK/segment metadata).
+- `.proto` / `.thrift` field renumbering or deletion.
+- New REST `@Path` methods or renamed `@JsonProperty` fields.
+- New `@Deprecated` annotations (good) vs. outright deletion of old APIs (bad).
+- New boolean flags in table/cluster config — check their default.
+- Changes to `DataTableBuilder`, `DataTableUtils`, `SegmentVersion`, `V1Constants`.
+
+Short list of hits per pattern.
+
+## 2. Deep analysis
+
+For each hit, apply the trigger match from the KB and compare the change against the BAD/GOOD examples. Specifically check:
+
+- **C1.1** Is the old key still accepted? Is resolution order `new → legacy`? Is the legacy key `@Deprecated`?
+- **C1.2** For new enum values / schema DataType: has the name been validated against SQL / Parquet / Arrow conventions? Names are permanent.
+- **C1.3** For SPI signature changes: could an existing plugin compiled against an older version still link? Prefer overloads over widening.
+- **C1.4** For reverts: does the commit reference the original PR and explain the failure mode?
+- **C1.5** For new `isXxxEnabled()`-style validation: does it resolve through table → instance → default override chain? Use the `Enablement` enum where it exists.
+- **C1.6** New feature flag defaults to OFF (`false` for `enableXxx`, or `false` for `disableXxx` = enabled).
+- **C1.7** Config namespace follows existing patterns (`pinot.broker.*`, `pinot.query.sse.*`, dot-separated lowercase).
+- Wire format: DataTable / segment-version bumps must keep the reader able to decode prior versions; confirm dual-read is tested.
+- Rolling upgrade: is there a written rolling-upgrade note for backward-incompat label PRs (broker-first vs controller-first)?
+
+If the diff doesn't match any trigger in this domain, return `no-findings`.
+
+## 3. Findings
+
+Emit each finding in the format below, matching the `code-reviewer` agent's output:
+
+```
+### [C1.x] — CRITICAL|MAJOR|MINOR
+**File:** `path/to/File.java:line`
+**Trigger:**
+**Problem:**
+**Fix:**
+```
+
+For issues not matching a specific principle but still in this domain, use `[BUG-CFG]`.
+
+Tag each finding with `skill: review-config-backcompat`. Include `also-see` pointers to related principles when applicable (e.g., C1.5 and C1.8 often co-occur).
+
+## When to defer to the developer
+
+- Renames inside a purely internal package (no JSON / SPI / serialization exposure).
+- Adding a legacy alias is technically possible but the original key has not been released yet (check `git log` for first introduction).
diff --git a/kb/skills/review-correctness-nulls.md b/kb/skills/review-correctness-nulls.md
new file mode 100644
index 000000000000..cc3ece01cd40
--- /dev/null
+++ b/kb/skills/review-correctness-nulls.md
@@ -0,0 +1,36 @@
+# review-correctness-nulls
+
+You are a specialized reviewer for **Apache Pinot domain 5: Correctness & Safety**. Read `kb/code-review-principles.md` section 5 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — silent wrong results (precision loss, missing enum case handled as default), memory leak on segment destroy, null dereference that crashes a query, missing null-bitmap update.
+- **MAJOR** — use of double fallback where polymorphic primitive dispatch exists; exception swallowed without logging; unchecked `Optional.get()`.
+- **MINOR** — `@Nullable` missing on a return that may be null; redundant null check.
+
+## 1. Broad scan
+
+- Return paths from stats / aggregators / dictionaries that may be empty — confirm they handle "no data" case (return null, not NPE).
+- `switch` on `DataType`, `FieldSpec.DataType`, `ColumnDataType`, `IndexType`, `SegmentVersion` — check for missing cases and `default: throw` vs. silent fallthrough.
+- `null` returns from new methods — confirm `@Nullable` is on the signature.
+- `close()` / `destroy()` methods — confirm they clear indexes, bitmaps, dictionaries, and null all references.
+- Arithmetic / aggregation code — look for unconditional cast-to-double or use of `BigDecimal` when `long` suffices.
+- Window / aggregation functions split by type — confirm INT/LONG/FLOAT/DOUBLE/BIG_DECIMAL each have a dedicated impl where precision matters.
+- Caught `Throwable` / `Exception` — confirm no silent swallow.
+
+## 2. Deep analysis
+
+- **C5.x** Null handling: dispatch on `getStoredType()`, update null-value-vector bitmap on insert/delete, test both `null-handling-enabled=true` and `false` paths.
+- **Exhaustive switches**: prefer `EnumSet.allOf` confirmation or a `default: throw new IllegalArgumentException("Unsupported " + type)`; silent fallthrough is a CRITICAL risk (see PR 18176 `IVF_ON_DISK` case).
+- **Precision**: INT/LONG windows must not coerce to double; BIG_DECIMAL requires its own aggregator. Flag coercions that may lose precision past 2^53.
+- **Resource cleanup**: `close()` must actively trim state, not rely on GC. Even if dangling refs are rare, explicit cleanup is the norm (see PR 18204 bitmap leak).
+- **Error messages**: `Preconditions.checkState`, `IllegalArgumentException`, `IllegalStateException` must include the offending value — not opaque.
+- **Off-by-one**: row iteration loops — confirm `< numDocs` not `<=`; confirm `docIdIterator` drains fully before reusing.
+
+## 3. Findings
+
+Tag `skill: review-correctness-nulls`, cite `C5.x`, use `[BUG-CORR]` for unnamed bugs. For precision / silent-wrong-result risks, always classify CRITICAL.
+
+## When to defer to the developer
+
+- Null path is explicitly out-of-scope in the PR description and a follow-up issue is linked.
+- `default` branch falls through to a documented best-effort path (rare; must be justified).
diff --git a/kb/skills/review-naming-api.md b/kb/skills/review-naming-api.md
new file mode 100644
index 000000000000..80d3f9959679
--- /dev/null
+++ b/kb/skills/review-naming-api.md
@@ -0,0 +1,35 @@
+# review-naming-api
+
+You are a specialized reviewer for **Apache Pinot domain 7: Naming & API Design**. Read `kb/code-review-principles.md` section 7 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — enum constant / DataType / schema type name inconsistent with SQL/Parquet/Arrow (names are permanent — see C1.2); public API with same-module name collision.
+- **MAJOR** — public class missing Javadoc; method name misrepresents behavior (e.g., `get` that mutates); inconsistent REST naming vs. existing resources.
+- **MINOR** — style — variable naming, inline FQCNs that should be imports, trailing "Helper" / "Util" suffix where a better noun exists.
+
+## 1. Broad scan
+
+- New / renamed public classes, interfaces, methods, fields.
+- New enum values — check against SQL / Parquet / Arrow conventions (see C1.2).
+- New `@Path` routes or `@JsonProperty` names — confirm kebab-case for URL, camelCase for JSON, consistent with neighbors.
+- Inline fully-qualified class names — flag (CLAUDE.md convention).
+- New public classes without class-level Javadoc — flag (CLAUDE.md convention).
+- License headers on new files — flag missing.
+
+## 2. Deep analysis
+
+- **C7.x** Confirm name matches behavior. `get*` should not mutate. `isXxx` / `hasXxx` for booleans. `toXxx` / `fromXxx` for conversions.
+- Consistency: compare the new name against ≥3 neighbors in the same package / module.
+- Public API surface: if adding a method to an SPI interface, confirm domain-1 backward-compat story (C1.3).
+- Javadoc: new public classes must describe behavior and thread-safety.
+- Imports: `com.foo.Bar foo = new com.foo.Bar()` → use import.
+- CLAUDE.md checks: license header, Java 21 target (Java 11 bytecode for SPI/client modules), SLF4J logger pattern.
+
+## 3. Findings
+
+Tag `skill: review-naming-api`, cite `C7.x`, use `[CONV]` for CLAUDE.md violations. Most findings here are MINOR; do not inflate severity.
+
+## When to defer to the developer
+
+- Name is internal-only (package-private) and the team has indicated preference.
+- A rename would force a larger refactor already deferred to a follow-up.
diff --git a/kb/skills/review-performance.md b/kb/skills/review-performance.md
new file mode 100644
index 000000000000..a530f375603b
--- /dev/null
+++ b/kb/skills/review-performance.md
@@ -0,0 +1,35 @@
+# review-performance
+
+You are a specialized reviewer for **Apache Pinot domain 4: Performance & Efficiency**. Read `kb/code-review-principles.md` section 4 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — documented benchmark regresses significantly; per-row allocation in top-level operator loop; large synchronized block on query-path singleton.
+- **MAJOR** — missing primitive fast-path (e.g., reusing `Long` boxing); unnecessary intermediate collection on the scan path; string concat inside a per-row loop.
+- **MINOR** — minor inefficiency off the hot path; logging at INFO inside a tight loop.
+
+## 1. Broad scan
+
+- Per-row methods: search for `getInt`, `getLong`, `getDouble`, `getString`, `getBytes`, `getValue`, `transform`, `filter`, `accept` inside operator / transform / aggregator files.
+- Allocations in loops: `new `, `Arrays.asList`, `Collections.singletonList`, `String.format`, `"x" + y`, lambdas capturing variables.
+- Boxing: use of `Integer`, `Long`, `Double`, `Boolean` where `int`, `long`, `double`, `boolean` would do; `Map`-style in hot code.
+- Virtual dispatch in hot loops: fields typed as an interface where a concrete class would let JIT inline.
+- `synchronized` / lock acquisition inside for-loops on the scan path.
+- Missing type-specific aggregator (e.g., `Sum` falls back to `BigDecimal` when `Long` would suffice — see recent PRs on `SumLongWindowValueAggregator`).
+- `ByteBuffer.duplicate()` / `slice()` in loops.
+
+## 2. Deep analysis
+
+- **C4.x** Ask: does this code run per-row, per-segment, or per-query? Apply the budget of each (per-row: zero allocations, no boxing, no virtual dispatch where avoidable).
+- Check for type-dispatch on `getStoredType()` rather than a double-coercion fallback. Precision loss past 2^53 is a correctness issue but also a perf giveaway (extra unbox + cast).
+- If the PR claims a perf gain, confirm a benchmark is attached or point to the `/bench-compare` skill as a next step.
+- If a benchmark shows a regression > 5%, flag CRITICAL regardless of other merits.
+- Avoid introducing `LOGGER.debug(String.format(...))` in per-row loops — even when debug is off, formatting may be eager.
+
+## 3. Findings
+
+Tag `skill: review-performance`, cite `C4.x`, use `[BUG-PERF]`. Quantify when possible ("allocation per row × 1M rows/sec = 1M objects/sec"). Recommend `/bench-compare ` when the impact is unclear.
+
+## When to defer to the developer
+
+- Change is behind a rarely-used feature flag and the PR acknowledges the trade-off.
+- The hot path has a documented JIT-inlining assumption and the change preserves it.
diff --git a/kb/skills/review-process-scope.md b/kb/skills/review-process-scope.md
new file mode 100644
index 000000000000..7ab55add22f9
--- /dev/null
+++ b/kb/skills/review-process-scope.md
@@ -0,0 +1,33 @@
+# review-process-scope
+
+You are a specialized reviewer for **Apache Pinot domain 8: Process & Scope**. Read `kb/code-review-principles.md` section 8 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — revert without referencing the original PR or explaining the regression; test-retry / sleep added to mask a flake (never fix by retry — investigate root cause); missing rolling-upgrade note on a backward-incompat change.
+- **MAJOR** — PR bundles multiple unrelated concerns; commit message doesn't explain WHY; new TODO with no issue link.
+- **MINOR** — PR title style; label missing.
+
+## 1. Broad scan
+
+- Diff size + module count. If > 500 lines or > 4 modules, flag for scope review.
+- Commit messages (`git log ..HEAD`): check each for a WHY clause.
+- New `// TODO` / `// FIXME` — confirm each has a linked issue.
+- Test retry patterns: `@Test(retryAnalyzer = ...)`, `Thread.sleep` added in tests, `@Flaky` annotations.
+- PR title / labels if available.
+
+## 2. Deep analysis
+
+- **C8.x** PR scope: one concern per PR; bundle refactor + test rewrite (that's acceptable) but not refactor + feature.
+- **Reverts**: must name the reverted PR number and the reason.
+- **No-retry rule**: flaky tests are investigated via `/flaky-analyze`, not retried into submission.
+- **Backward-incompat labeling**: any change touching wire formats / APIs / configs that can't roll-forward-and-back needs the `backward-incompat` label and a rolling-upgrade note.
+- **TODO hygiene**: every TODO links to an issue.
+
+## 3. Findings
+
+Tag `skill: review-process-scope`, cite `C8.x`, use `[PROC]` for process nits. Most findings MINOR; the retry-to-fix-flake and revert-without-reference cases are CRITICAL.
+
+## When to defer to the developer
+
+- PR is pre-coordinated large refactor with a linked design doc; scope is justified.
+- Label was set after PR description was drafted; confirm it's now correct.
diff --git a/kb/skills/review-testing.md b/kb/skills/review-testing.md
new file mode 100644
index 000000000000..9adff0dbd71b
--- /dev/null
+++ b/kb/skills/review-testing.md
@@ -0,0 +1,60 @@
+# review-testing
+
+You are a specialized reviewer for **Apache Pinot domain 6: Testing Strategies**. Read `kb/code-review-principles.md` section 6 and `CLAUDE.md`.
+
+Severity:
+- **CRITICAL** — bug-fix PR with no regression test; wire-format change with no mixed-version test; null-aware change that tests only one of the two null-handling modes.
+- **MAJOR** — positive-only test (no negative / error-case); mock-heavy test where a real dictionary/segment is trivially available; missing test for a new public code path; new integration test spins up its own cluster when `CustomDataQueryClusterIntegrationTest` would suffice.
+- **MINOR** — assertion style (`assertTrue(x == y)` vs `assertEquals`); unclear test names; flaky-prone timing assumptions.
+
+## 1. Broad scan
+
+- For every file under `src/main/` changed, check if a corresponding `src/test/` file also changed. Missing = suspect.
+- Find new tests and scan for:
+ - Test framework: TestNG (`import org.testng.annotations.Test`) unless the file uses JUnit consistently.
+ - Mocks: `@Mock`, `Mockito.when`, `mock(...)`. Flag mocks of `Dictionary`, `ForwardIndexReader`, `NullValueVectorReader` — these should be real in most cases (see PR 18189).
+ - `assertTrue` / `assertFalse` on compound expressions — prefer `assertEquals` / `assertThrows`.
+ - Missing `@Test(dataProvider=...)` when the production code has a type-dispatch switch — a single type is insufficient coverage.
+ - Timing assumptions: `Thread.sleep`, `System.currentTimeMillis()` in assertions → flakiness.
+- Integration tests: new REST / Thrift / wire-format changes need a `*IntegrationTest` case; check `pinot-integration-tests/`.
+- Integration-test base class: reject new standalone Pinot integration test classes that use `BaseClusterIntegrationTest`
+ or spin up their own controller/broker/server when no special cluster setup is required. For ordinary schema/data/query
+ behavior, require reusing an existing `CustomDataQueryClusterIntegrationTest` test or adding a focused subclass under
+ `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` with
+ `@Test(suiteName = "CustomClusterIntegrationTest")`, so it is included by
+ `pinot-integration-tests/src/test/resources/custom-cluster-integration-test-suite.xml`.
+
+## 2. Deep analysis
+
+- **C6.x** Positive + negative: every new feature needs at least one passing and one rejected-input test.
+- **Real dependencies where semantics matter**: dictionaries, segment readers, null-vector readers, aggregators, transform functions — use real instances. Mocks hide encoding-sensitive bugs.
+- **Null-handling coverage**: any change under null-aware code must test both `null-handling-enabled=true` and `false`.
+- **Type coverage**: for aggregator / window / transform changes, test INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING, BYTES, BOOLEAN, TIMESTAMP, JSON as applicable.
+- **Mixed-version tests**: for wire-format, Helix, or controller-API changes, confirm a rolling-upgrade scenario is exercised (controller-old + broker-new, and vice-versa).
+- **Regression evidence**: bug-fix PRs must include a test that fails on `HEAD~1` and passes on `HEAD`. Flag absence.
+- **Flakiness hygiene**: never add a `Thread.sleep` as a test-stability knob; prefer `Awaitility` / explicit events. Do not mask flakes with retries (see domain 8 process rule).
+
+### Core-functionality + integration-test base-class selection
+
+Every non-trivial change must exercise the **core functionality** it introduces. For query-semantics changes (new function, index type, aggregator, transform, SQL construct, stored-type behavior) that can be validated with ordinary table data and the default cluster topology, the integration test **must** reuse an existing `CustomDataQueryClusterIntegrationTest` test or add a focused subclass under `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` — not a fresh `BaseClusterIntegrationTest` subclass.
+
+Rationale: `CustomDataQueryClusterIntegrationTest` shares one controller / broker / server / ZK across the whole test suite (`@BeforeSuite`, `_sharedClusterTestSuite`). Spinning up a second cluster per test costs ~30–60 s of ZK/Helix startup and inflates CI time linearly with test count. The custom base lets each test bring its own schema, data, and SQL assertions on top of the shared cluster. Nearly all feature validation (window functions, sketches, vector indexes, geo, JSON, timestamp, bytes, distinct, group-by options, star-tree, unnest, SSB queries, etc.) already follows this pattern — look for neighbors in `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/` before adding a new top-level cluster test.
+
+A new test may extend `BaseClusterIntegrationTest` (or a specialized base) **only** when the change demands one of:
+- Different cluster topology (multi-tenant, multi-broker, multi-server, dedicated minion).
+- Non-default Pinot component configuration (custom broker / server / controller properties, auth, TLS, access control).
+- Different Helix / ZK layout or tenant isolation that can't be simulated with table-level config.
+- Realtime/streaming wiring that the custom base does not already provide, or lifecycle transitions that require cluster restart.
+
+Flag as **MAJOR / C6.10**: a new test under `pinot-integration-tests/` whose class body only defines schema, data, and SQL assertions but extends `BaseClusterIntegrationTest` directly or otherwise creates a dedicated cluster. The fix is to re-parent to `CustomDataQueryClusterIntegrationTest`, move it into the `custom/` package, annotate it with `@Test(suiteName = "CustomClusterIntegrationTest")`, and ensure it is covered by `custom-cluster-integration-test-suite.xml`. Require the PR to state, in the description, which of the four "only-when" conditions above applies if the fresh cluster is justified.
+
+Independently, flag as **CRITICAL** a change to core functionality (new SQL function, new index type, behavior change of an existing operator/aggregator) that ships with only unit tests and no integration test of any kind — unit coverage alone doesn't prove the feature works end-to-end through planner → broker → server → segment.
+
+## 3. Findings
+
+Tag `skill: review-testing`, cite `C6.x`, use `[BUG-TEST]`. When flagging missing tests, always suggest a concrete test signature / data-provider shape.
+
+## When to defer to the developer
+
+- PR is a pure rename / move with no behavior change and existing tests still exercise the paths.
+- New code is test-only scaffolding.
diff --git a/kb/skills/run-test.md b/kb/skills/run-test.md
new file mode 100644
index 000000000000..3648fc3ac704
--- /dev/null
+++ b/kb/skills/run-test.md
@@ -0,0 +1,46 @@
+# run-test
+
+Purpose: resolve a test class name to its Maven module and run only that test, without the user having to remember the exact `-pl`, `-am`, `-Dtest`, and `-Dsurefire.failIfNoSpecifiedTests` flags.
+
+Usage:
+- `/run-test RangeIndexTest` — single class.
+- `/run-test RangeIndexTest#testSpecificMethod` — single method.
+- `/run-test OfflineClusterIntegrationTest` — integration test (auto-detected, adds the required flag).
+
+## Procedure
+
+1. **Parse the argument.** Split on `#` into `` and optional ``. If the class name contains a dot, treat it as FQN.
+
+2. **Locate the source file.**
+ - Glob for `**/.java` under the repo.
+ - Prefer matches under `src/test/java/`.
+ - If multiple matches, list them (with module prefixes) and ask the user which one. Do not guess.
+ - If zero matches, report and stop.
+
+3. **Find the owning module.** Walk up from the test file until you find a `pom.xml` that is not the repo root. That's the module.
+
+4. **Note on the `failIfNoSpecifiedTests` flag.** Always pass `-Dsurefire.failIfNoSpecifiedTests=false`, regardless of whether the target is a unit or integration test. With `-am`, Maven runs the full Surefire goal on every upstream module (e.g. `pinot-spi` → `pinot-common` → … → target module). Each of those modules invokes Surefire with the same `-Dtest=` filter, and Surefire's default behaviour is to **fail the whole build** when the pattern doesn't match any test in a given module. Without this flag, the build dies at the first upstream module that doesn't happen to contain ``. This applies equally to unit tests (upstream modules don't have the test) and integration tests (the `pinot-integration-tests` module has tests the filter doesn't match).
+
+ Optional: detect integration tests for reporting/warnings only. A test is an integration test if *any* of these hold:
+ - The file path contains `pinot-integration-tests`.
+ - The file is named `*IntegrationTest.java`, `*IT.java`, `*ClusterTest.java`, or `*EndToEndTest.java`.
+ - The module is `pinot-integration-tests` or `pinot-compatibility-verifier`.
+
+ Use this only to warn the user about expected runtime ("integration tests typically take 10–20 min"), not to alter the command.
+
+5. **Build the command.**
+ ```
+ ./mvnw -pl -am -Dtest=[#] -Dsurefire.failIfNoSpecifiedTests=false test
+ ```
+ - `-am` is intentional: the test needs upstream module JARs built.
+ - `-Dsurefire.failIfNoSpecifiedTests=false` is always required when `-am` is set (see step 4).
+
+6. **Run and report.** Print the exact command before running so the user can copy/tweak it. On failure, show the last ~60 lines of the Maven output (or the Surefire report path under `/target/surefire-reports/`) so the user can jump straight to the stack trace.
+
+## Notes
+
+- These runs can take 2–15 minutes depending on the module and whether deps are already built. Consider `run_in_background` only if the user says so — default is foreground so they see progress.
+- Never strip `-am`. The first run after a clean checkout will fail without it.
+- If the user wants to run without rebuilding upstream (faster iteration), suggest they add `-o` (offline) or drop `-am` after the first successful build — but don't do it automatically.
+- For repeat runs of the same test, suggest `-DfailIfNoTests=false` if the first run reported "No tests were executed" — usually a typo in the class name.
+- If the class is `abstract` or has no `@Test` methods (it's a base class), warn the user and suggest concrete subclasses found via grep.