Skip to content

fix(validators): harden mcp-name matching (PyPI/NuGet anchoring, comment-form safe) + cargo follow-ups#1331

Merged
rdimitrov merged 5 commits into
mainfrom
followup/registry-anchoring
Jun 5, 2026
Merged

fix(validators): harden mcp-name matching (PyPI/NuGet anchoring, comment-form safe) + cargo follow-ups#1331
rdimitrov merged 5 commits into
mainfrom
followup/registry-anchoring

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

@rdimitrov rdimitrov commented Jun 3, 2026

Registry-wide hardening of the mcp-name: ownership-token match, plus the cargo follow-up fixes from the #1330 review — consolidated here per maintainer request. Rebased onto current main (the merged #1330 cargo commit drops out), so the diff is just the net-new work.

Changes

  • PyPI/NuGet: boundary-anchored token match. Replace strings.Contains(readme, "mcp-name: "+name) with the shared containsMCPNameToken, so a README declaring a longer name (…/widget-pro) no longer satisfies an ownership claim for a shorter prefix (…/widget). (Cargo already got this in fix(cargo): harden README fetch, clarify status handling, strengthen tests #1330; NPM is unaffected — it compares an exact metadata field.)
  • Matcher: treat the HTML comment close as a boundary. <!-- mcp-name: NAME--> / <!--mcp-name: NAME--> (any spacing) validate again — the documented hidden-comment form for PyPI/NuGet — while a genuine longer name (…/widget--pro) still does not.
  • Cargo hardening (review of fix(cargo): harden README fetch, clarify status handling, strengthen tests #1330): pin scheme+port (not just host) on the README fetch and redirects; a rate-limited/failed crate-version existence probe now reports transient instead of "not found"; a 403 with the crate present no longer flatly asserts "no README".
  • Docs: note the token must be followed by a boundary.

⚠️ Behavior change for PyPI/NuGet — read before merging

The match is strictly stricter (verified by fuzz, 2.3M execs: it can only flip pass→fail, never fail→pass). After the comment-close fix, the only forms that newly fail are unusual inline ones: the token ending a sentence (…/my-mcp.) or glued to a trailing /. The documented forms (own line, in <!-- … -->) are unaffected.

Correcting an earlier claim: this re-validates on edits too, not just new versions — edit.go → UpdateServer → ValidateUpdateRequest → validateRegistryOwnership runs the token check on any edit of a live server (only delete-transitions skip it). So an existing PyPI/NuGet server whose README uses one of the breaking inline forms would fail on its next publish or edit. Given the v0.1 API-freeze posture, this should land as a deliberate, noted change.

Testing

go build, vet, gofmt, golangci-lint (prod files), check-schema, validate-examples all clean. Hermetic + live PyPI/NuGet/cargo suite passes. New: TestCargoURLAllowed, a combined-fixture probe-429→transient case, comment-form matcher cases, and FuzzContainsMCPNameToken (the strictly-stricter property, 2.3M execs).

🤖 Generated with Claude Code

@rdimitrov rdimitrov force-pushed the followup/cargo-hardening branch from bc87b70 to d796057 Compare June 4, 2026 20:08
@rdimitrov rdimitrov force-pushed the followup/registry-anchoring branch from 5b3e6b5 to 57aacf9 Compare June 4, 2026 20:08
Base automatically changed from followup/cargo-hardening to main June 4, 2026 21:34
@rdimitrov rdimitrov marked this pull request as ready for review June 4, 2026 21:35
rdimitrov and others added 4 commits June 5, 2026 01:33
…uGet

Stacked on the cargo follow-up (introduces containsMCPNameToken). This extends
the boundary-anchored ownership-token match to the PyPI and NuGet validators,
replacing their bare strings.Contains checks so a README declaring a longer name
(e.g. io.github.acme/widget-pro) no longer satisfies a claim for a shorter
prefix (io.github.acme/widget).

⚠️ BEHAVIOR CHANGE for PyPI/NuGet (not just additive):
The new match is strictly stricter — it can only flip a previously-passing
publish to failing, never the reverse. The realistic case that flips is a README
whose ONLY occurrence of the token is immediately followed by a server-name
character [A-Za-z0-9._/-], e.g. a trailing period in prose
("...published as mcp-name: io.github.acme/widget."). The token on its own line,
in backticks, or followed by whitespace/newline/HTML-tag is unaffected.

Re-validation runs only at publish time (CreateServer); edits/status updates do
not re-check ownership and there is no background re-validation, so already-
stored servers are not affected — but an existing PyPI/NuGet publisher pushing a
NEW VERSION with the token in the glued form would fail where it previously
passed. Given the v0.1 API freeze, this should land deliberately and not be
promoted to prod without sign-off. Live positive tests (time-mcp-pypi,
TimeMcpServer) still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The boundary-anchored matcher rejected the documented hidden-comment form when
the trailing space was omitted: `<!-- mcp-name: NAME-->` / `<!--mcp-name: NAME-->`
fail because the byte after NAME is `-` (a server-name char). Since PyPI/NuGet
publishers commonly hide the token in an HTML comment, this would break the
recommended form on the next publish or edit.

Add isMCPNameBoundary, which treats the HTML comment close (`-->` / `--!>`)
immediately after the name as a boundary, so all spacing variants of the comment
form validate while a genuine longer name (e.g. `…/widget--pro`) still does not.

Tests: comment-form cases (spaced/unspaced/legacy `--!>`) and a double-hyphen
longer-name negative; plus FuzzContainsMCPNameToken pinning the safety property
that the matcher is strictly stricter than strings.Contains (can only flip
pass→fail, never fail→pass) — verified over 2.3M executions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…en 403

Three follow-up hardening fixes to the cargo validator (review of #1330):

- SSRF: the README host pin and the redirect policy keyed on Hostname() only, so
  any port/scheme on crates.io/static.crates.io was accepted. cargoURLAllowed now
  additionally requires https + the default port for the real crates.io base
  (test/mock bases still match on host only, so httptest fixtures keep working).
- Transient 403 disambiguation: when the README CDN 403s, the crate-version
  existence probe previously reported "not found" if the probe itself failed
  (429/5xx/network) — the same misclassification the 5xx handling fixed one layer
  up. probeCargoVersion now returns a four-state result and a transient probe
  yields a retryable message instead of "not found".
- A 403 with the crate present no longer flatly asserts "no rendered README"
  (a 403 isn't definitive proof — could be a CDN/WAF block); the message now says
  the README could not be retrieved and gives the actionable next step.

Tests: TestCargoURLAllowed (https/port/userinfo/foreign-host matrix) and a
combined-fixture case where the 403 existence-probe is rate-limited (429) and
must report transient, not "not found".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Note in the PyPI and NuGet ownership sections that the token must be followed by
a newline, whitespace, an HTML tag, or the comment close `-->`, and must not be
glued to trailing punctuation (e.g. a sentence-ending period). The matcher fix
handles the comment-close case; this documents the remaining boundary rule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov force-pushed the followup/registry-anchoring branch from 55f72ed to 165c1bf Compare June 5, 2026 06:09
@rdimitrov rdimitrov changed the title fix(validators): adopt boundary-anchored mcp-name match in PyPI and NuGet (behavior change) fix(validators): harden mcp-name matching (PyPI/NuGet anchoring, comment-form safe) + cargo follow-ups Jun 5, 2026
If a publisher's README contains `mcp-name: NAME` but the boundary-anchored match
rejects it (the token is glued to a trailing character such as a sentence-ending
period or `/`), the previous error said the name "must appear as 'mcp-name: NAME'
in the package README" — which the publisher sees that it already does, giving no
clue what's wrong. This is the failure mode of the registry-wide anchoring change,
so the message needs to name the cause.

Add mcpNameTokenGluedTrailing, which reports the offending trailing character when
the literal token is present but unterminated, and use it in the PyPI, Cargo, and
NuGet validators to emit an actionable message: "found 'mcp-name: NAME' but it is
immediately followed by 'X' — put it on its own line and republish". NuGet gains a
GluedReadme state to distinguish this from a genuinely-absent token.

Tests: TestMCPNameTokenGluedTrailing for the helper, and a cargo combined-fixture
case asserting the explanatory message on a glued trailing period.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov merged commit 04623ed into main Jun 5, 2026
6 checks passed
@rdimitrov rdimitrov deleted the followup/registry-anchoring branch June 5, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant