fix(cargo): harden README fetch, clarify status handling, strengthen tests#1330
Merged
Conversation
7 tasks
…tests Follow-up to #1207, addressing items from review of the cargo validator. Scoped to cargo + shared helpers only — no behavior change for existing npm/pypi/nuget/oci/mcpb publishers. SSRF hardening of the two-call README retrieval: - Pin the step-2 README URL to an allowed host (static.crates.io in prod; the base host under test) before fetching, so a metadata response cannot steer the validator at an internal/attacker host. - CheckRedirect policy pins every redirect hop to the same allowlist. - Cap the README body with io.LimitReader (5 MiB). Clearer status handling (previously any non-5xx/non-200 collapsed to "not found"): - 429 is reported as transient/retryable rather than "not found", at both steps. - A 403 from static.crates.io is disambiguated via the crate-version endpoint: genuinely-missing stays "not found"; exists-but-no-README gets an actionable "add a README with mcp-name and republish" message (mirrors NuGet). This continues the 5xx-vs-403 diagnostic split @P4ST4S started in the #1207 review. Add a shared, boundary-anchored containsMCPNameToken helper (prevents prefix confusion, e.g. a README declaring io.github.acme/widget-pro satisfying a claim for io.github.acme/widget) and use it from the cargo validator. Adopting it in the PyPI/NuGet validators is a follow-up (it is a behavior change for those already-live registries). Tests: hermetic positive ServerNameFormats; combined fixture gains a version-existence endpoint + cases for 403-missing vs 403-no-readme, 429, and prefix-confusion rejection; new foreign-README-host (SSRF) test; internal test for the helper. Docs: list crates.io in registry requirements + the Package doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bc87b70 to
d796057
Compare
rdimitrov
added a commit
that referenced
this pull request
Jun 5, 2026
…en 403 Three follow-up hardening fixes to the cargo validator (review of #1330): - SSRF: the README host pin and the redirect policy keyed on Hostname() only, so any port/scheme on crates.io/static.crates.io was accepted. cargoURLAllowed now additionally requires https + the default port for the real crates.io base (test/mock bases still match on host only, so httptest fixtures keep working). - Transient 403 disambiguation: when the README CDN 403s, the crate-version existence probe previously reported "not found" if the probe itself failed (429/5xx/network) — the same misclassification the 5xx handling fixed one layer up. probeCargoVersion now returns a four-state result and a transient probe yields a retryable message instead of "not found". - A 403 with the crate present no longer flatly asserts "no rendered README" (a 403 isn't definitive proof — could be a CDN/WAF block); the message now says the README could not be retrieved and gives the actionable next step. Tests: TestCargoURLAllowed (https/port/userinfo/foreign-host matrix) and a combined-fixture case where the 403 existence-probe is rate-limited (429) and must report transient, not "not found". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rdimitrov
added a commit
that referenced
this pull request
Jun 5, 2026
…ent-form safe) + cargo follow-ups (#1331) Registry-wide hardening of the `mcp-name:` ownership-token match, plus the cargo follow-up fixes from the #1330 review — consolidated here per maintainer request. Rebased onto current `main` (the merged #1330 cargo commit drops out), so the diff is just the net-new work. ### Changes - **PyPI/NuGet: boundary-anchored token match.** Replace `strings.Contains(readme, "mcp-name: "+name)` with the shared `containsMCPNameToken`, so a README declaring a longer name (`…/widget-pro`) no longer satisfies an ownership claim for a shorter prefix (`…/widget`). (Cargo already got this in #1330; NPM is unaffected — it compares an exact metadata field.) - **Matcher: treat the HTML comment close as a boundary.** `<!-- mcp-name: NAME-->` / `<!--mcp-name: NAME-->` (any spacing) validate again — the documented hidden-comment form for PyPI/NuGet — while a genuine longer name (`…/widget--pro`) still does not. - **Cargo hardening (review of #1330):** pin scheme+port (not just host) on the README fetch and redirects; a rate-limited/failed crate-version existence probe now reports *transient* instead of "not found"; a 403 with the crate present no longer flatly asserts "no README". - **Docs:** note the token must be followed by a boundary. ###⚠️ Behavior change for PyPI/NuGet — read before merging The match is **strictly stricter** (verified by fuzz, 2.3M execs: it can only flip pass→fail, never fail→pass). After the comment-close fix, the **only** forms that newly fail are unusual *inline* ones: the token ending a sentence (`…/my-mcp.`) or glued to a trailing `/`. The documented forms (own line, in `<!-- … -->`) are unaffected. **Correcting an earlier claim:** this re-validates on **edits too**, not just new versions — `edit.go → UpdateServer → ValidateUpdateRequest → validateRegistryOwnership` runs the token check on any edit of a live server (only delete-transitions skip it). So an existing PyPI/NuGet server whose README uses one of the breaking inline forms would fail on its next publish *or* edit. Given the v0.1 API-freeze posture, this should land as a deliberate, noted change. ### Testing `go build`, `vet`, `gofmt`, `golangci-lint` (prod files), `check-schema`, `validate-examples` all clean. Hermetic + **live** PyPI/NuGet/cargo suite passes. New: `TestCargoURLAllowed`, a combined-fixture probe-429→transient case, comment-form matcher cases, and `FuzzContainsMCPNameToken` (the strictly-stricter property, 2.3M execs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #1207, addressing items from the cargo validator review. Scoped to cargo + shared helpers — no behavior change for existing npm/pypi/nuget/oci/mcpb publishers. Safe to merge and promote.
Changes
SSRF hardening of the two-call README retrieval
static.crates.ioin prod; the base host under test) before fetching, so a metadata response can't steer the validator at an internal/attacker host.CheckRedirectpolicy pins every redirect hop to the same allowlist (the initial URL being pinned isn't enough if an upstream 3xx is followed).io.LimitReader(5 MiB).Clearer status handling (previously any non-5xx/non-200 collapsed to "not found")
mcp-nameand republish" message (mirrors the NuGet validator). This continues the 5xx-vs-403 diagnostic split @P4ST4S started in the feat: add cargo (crates.io) as a package registry type #1207 review — thanks!Shared
containsMCPNameTokenhelperBoundary-anchored ownership-token match (prevents prefix confusion, e.g. a README declaring
io.github.acme/widget-prosatisfying a claim forio.github.acme/widget), used here by the cargo validator only. Adopting it in PyPI/NuGet is a separate follow-up because it's a behavior change for those already-live registries.Tests & docs
Hermetic positive
ServerNameFormats; combined fixture gains a version-existence endpoint + cases for 403-missing vs 403-no-README, 429, and prefix-confusion rejection; new foreign-README-host (SSRF) test; internal test for the helper. Docs: listcrates.ioin the registry-requirements list + thePackagemodel doc.Testing
go build,go vet, full./internal/validators/...suite (incl. live cargo positive),make check-schema,validate-examplesall pass. No newgolangci-lintfindings.🤖 Generated with Claude Code