Skip to content

Follow-up: cargo validator hardening + registry-wide mcp-name token anchoring#1

Closed
rdimitrov wants to merge 3 commits into
Wolfe-Jam:feat/cargo-package-registry-supportfrom
rdimitrov:cargo-review-followups
Closed

Follow-up: cargo validator hardening + registry-wide mcp-name token anchoring#1
rdimitrov wants to merge 3 commits into
Wolfe-Jam:feat/cargo-package-registry-supportfrom
rdimitrov:cargo-review-followups

Conversation

@rdimitrov
Copy link
Copy Markdown

Stacked follow-up on modelcontextprotocol#1207

This branch is stacked on top of #1207 (base = feat/cargo-package-registry-support), so the diff here is only the follow-up changes. It collects the actionable items from a review of the cargo work. Entirely optional — merge into the cargo branch if you agree, or cherry-pick the parts you like. Nothing here blocks modelcontextprotocol#1207.

What's here

1. Registry-wide: boundary-anchor the mcp-name token match (a5ef071)
PyPI, NuGet, and Cargo all matched ownership with a bare strings.Contains(readme, "mcp-name: "+serverName). That allows prefix confusion: a README declaring mcp-name: io.github.acme/widget-pro also satisfies a claim for the shorter io.github.acme/widget. It's contained by namespace auth (can't cross a namespace you don't own) so it's low-severity, but it weakens the package↔name binding. Extracted a shared containsMCPNameToken that requires a trailing boundary (end-of-content or a non-server-name char). NPM is untouched (it compares an exact field).

⚠️ This touches the PyPI and NuGet validators, not just cargo — flagging the wider blast radius for review. The live positive tests (time-mcp-pypi, TimeMcpServer, rust-faf-mcp v0.3.1) all still pass, confirming real-world READMEs with the token on their own line continue to validate.

2. Cargo: SSRF hardening + clearer status handling (3036c47)

  • The step-2 README URL (returned by the metadata endpoint) was fetched with no host check, and the client followed redirects to any host. Now: the README host is pinned to an allowlist (static.crates.io in prod; the mock host under test) before fetching, a CheckRedirect policy pins every redirect hop, and the body is capped with io.LimitReader (5 MiB). Not publisher-exploitable today (the URL comes from crates.io), but defense-in-depth so the "host is pinned" guarantee holds in code rather than by crates.io's good behavior.
  • Previously every non-5xx/non-200 collapsed to "not found". Now 429 is reported as transient/retryable (crates.io rate-limits crawlers), and a 403 is disambiguated via the crate-version endpoint: genuinely missing → "not found"; exists-but-no-README → actionable "add a README with mcp-name and republish" (mirrors NuGet). Verified live: e.g. libc 0.1.0 exists but its README CDN URL 403s.

3. Test quality (in 3036c47)

  • ServerNameFormats is now a hermetic positive test (each format must validate when present as an exact token) instead of asserting against a token-less live crate where every case failed trivially.
  • Combined fixture gains a version-existence endpoint and cases for 403-missing vs 403-no-README, 429-transient, and prefix-confusion rejection; new RejectsForeignReadmeHost test for the SSRF pin; new internal test for containsMCPNameToken; PositivePathMock no longer calls t.Fatalf from the httptest handler goroutine.

4. Docs (351f53a)
Add Cargo/https://crates.io to the canonical "Restricted Registry Base URLs" list and to the Package model doc comment + struct tag (both still omitted it).

Testing

go build, go vet, full ./internal/validators/... suite (incl. live PyPI/NuGet/Cargo positives), make check-schema, and validate-examples all pass. No new golangci-lint findings vs. base. Branch is rebased on current main (via the merge in modelcontextprotocol#1207).

🤖 Generated with Claude Code

rdimitrov and others added 3 commits June 3, 2026 17:26
The PyPI, NuGet, and Cargo validators all proved package ownership with a
bare `strings.Contains(readme, "mcp-name: "+serverName)`. That substring check
is vulnerable to prefix confusion: a README that legitimately declares
`mcp-name: io.github.acme/widget-pro` also satisfies an ownership claim for the
shorter `io.github.acme/widget`, because the shorter string is a substring of
the longer one.

This is contained by namespace authorization (a publisher can only claim names
within a namespace they already own, so it cannot cross a namespace boundary),
but it still weakens the package<->server-name binding the token is meant to
prove. Extract a shared, boundary-anchored containsMCPNameToken helper and use
it from all three README-token validators: a match now requires the character
after the server name to be end-of-content or a non-server-name character
(whitespace, newline, or an HTML tag delimiter from a rendered README). NPM is
unaffected — it compares an exact metadata field rather than scanning README text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ngthen tests

Addresses review findings on the cargo validator's two-call README retrieval:

SSRF hardening (the README pointer + redirects were unconstrained):
- Pin the step-2 README URL to an allowed host (static.crates.io for the real
  base; the base host itself for httptest mocks) before fetching, so a metadata
  response cannot steer the validator at an internal/attacker-chosen host.
- Add a CheckRedirect policy pinning every redirect hop to the same allowlist —
  the initial URL being host-pinned is not enough if an upstream 3xx is followed.
- Cap the README body with io.LimitReader (5 MiB) so an oversized response
  cannot exhaust validator memory.

Clearer status handling (previously any non-5xx, non-200 collapsed to "not found"):
- 429 (crates.io rate-limiting) is now reported as transient/retryable rather
  than as "not found", at both the metadata and README steps.
- A 403 from static.crates.io is disambiguated via the crate-version metadata
  endpoint: a genuinely-missing crate/version stays "not found", but a crate
  that exists with no rendered README gets an actionable "add a README
  containing mcp-name and republish" message (mirrors the NuGet validator).
- The non-200 branches are extracted into helpers to keep cyclomatic complexity
  within the project's cyclop limit.

Use the boundary-anchored containsMCPNameToken helper; promote the contact
User-Agent to a package const (was a function-local shadowing nuget.go's const).

Tests:
- ServerNameFormats is now a hermetic POSITIVE test (each format must validate
  when present as an exact token) instead of asserting against a token-less live
  crate where every case failed trivially.
- Combined fixture gains a version-existence endpoint and cases for 403-missing
  vs 403-no-readme, 429-transient, and prefix-confusion rejection.
- New RejectsForeignReadmeHost test for the SSRF host pin.
- PositivePathMock no longer calls t.Fatalf from the httptest handler goroutine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The validator enforces https://crates.io as the only supported Cargo base URL,
but the canonical "Restricted Registry Base URLs" list and the Package model's
doc comment / struct tag still enumerated only npm/pypi/nuget/oci/mcpb. Add
Cargo to both so the supported-registry surface is consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdimitrov
Copy link
Copy Markdown
Author

Superseding this with follow-up PRs opened directly against modelcontextprotocol/registry:main now that modelcontextprotocol#1207 is merged — easier to review and CI there. Closing to avoid confusion; thanks!

@rdimitrov rdimitrov closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant