Skip to content

fix(config): detect overlapping-alternation ReDoS in the regex safety guard#1851

Draft
james-elicx wants to merge 3 commits into
mainfrom
fix/redos-overlapping-alternation
Draft

fix(config): detect overlapping-alternation ReDoS in the regex safety guard#1851
james-elicx wants to merge 3 commits into
mainfrom
fix/redos-overlapping-alternation

Conversation

@james-elicx

Copy link
Copy Markdown
Member

Summary

isSafeRegex / safeRegExp (config/config-matchers.ts) is the ReDoS guard for every regex compiled from config and route input: redirect/rewrite/header source, has/missing value, and the middleware matcher. All of these are matched against attacker-controlled pathnames, headers, cookies, and query values on every request.

The guard only detected nested quantifiers ((a+)+, (.*)*). It let through the complementary "overlapping alternation under an unbounded quantifier" class, which has no inner quantifier but still backtracks exponentially:

(a|a)*        rated SAFE by isSafeRegex, yet 9025 ms to fail-match a 29-char input (exponential)
(a|a|a)+      same class
(\d|\d\d){2,} same class

Each added input character roughly doubles the time. Impact:

  • Node production server — single event loop, so a slow match blocks all concurrent requests (DoS).
  • Cloudflare Workers — bounded by the per-request CPU limit (offending request killed; others unaffected), but still wasted budget.

A developer who relies on this guard (e.g. copies a regex into a has rule) is not actually protected against this textbook evil-regex shape.

Fix

Add a second, conservative check alongside the existing nested-quantifier pass: a group directly repeated by an unbounded quantifier (*, +, {n,}) whose top-level alternation has overlapping branches — a branch that is equal to, or a textual prefix of, another (a|a, a|ab, \d|\d\d) — is rejected.

Precision is preserved so real config keeps working:

  • Bounded repetition ({n,m}), the optional ?, and groups without alternation are untouched.
  • Disjoint-token alternations — (foo|bar)*, (GET|POST|PUT)+, (en|fr|de)* — do not overlap and stay safe (verified by the existing (foo|bar)* → safe test, which still passes).
  • Non-capturing (?:a|a)* is unwrapped and detected; lookarounds / named / inline-flag groups are skipped (the nested-quantifier pass still applies to them).
  • vinext's own generated patterns are unaffected: optional segments use (?:…)? (bounded), and locale captures like (en|sv|nl) carry no unbounded quantifier.

A rejected pattern is ignored with a console.warn — the same safe failure mode the guard already uses for nested quantifiers (the app still runs; that one rule just doesn't apply). The warning text now names the new class.

Testing

New isSafeRegex unit tests (in tests/shims.test.ts) cover the rejected class — (a|a)*, (a|a|a)+, (foo|foo)+, (a|ab)*, (\d|\d\d)+, (ab|abc)+, (a|a){2,}, (?:a|a)* — and assert the safe cases stay safe: disjoint alternations, and overlapping branches under bounded/optional/no repetition ((a|a){2,5}, (a|a)?, (a|a)).

✓ tests/shims.test.ts -t "isSafeRegex"   (23 tests)
✓ tests/shims.test.ts -t "matchPattern"  (9 tests)
✓ tests/routing.test.ts tests/route-sorting.test.ts tests/middleware-runtime.test.ts tests/app-router-external-rewrite.test.ts   (164 tests)
✓ tests/nextjs-compat/basepath.test.ts   (25 tests)
✓ vp check (format + lint + types) on all changed files

Notes

The heuristic stays best-effort (full regex-ambiguity analysis is out of scope); this closes the concrete, measured gap without false positives on common config alternations. A follow-up could consider a vetted linear-time matcher or a per-match timeout for defense-in-depth.

🤖 Generated with opencode

… guard

`isSafeRegex` is the ReDoS guard for every regex compiled from config and
route input (redirect/rewrite/header `source`, `has`/`missing` `value`, and
middleware `matcher`), all of which are matched against attacker-controlled
pathnames, headers, cookies, and query values. It only detected *nested*
quantifiers (`(a+)+`), so it let through the complementary "overlapping
alternation under an unbounded quantifier" class — e.g. `(a|a)*`,
`(a|ab)+`, `(\d|\d\d){2,}`. Those contain no inner quantifier but still
backtrack exponentially: matching `(a|a)*` against `"aaaa…!"` already takes
~9s at 29 characters and roughly doubles per added character. On the Node
production server this blocks the event loop (DoS); on Workers it burns the
request's CPU budget.

Add a second, conservative check: a group directly repeated by an unbounded
quantifier (`*`, `+`, `{n,}`) whose top-level alternation has overlapping
branches — a branch equal to, or a textual prefix of, another — is rejected.
Bounded repetition (`{n,m}`), the optional `?`, and disjoint-token
alternations (`(foo|bar)*`, `(GET|POST)+`) are unaffected, so legitimate
config patterns keep working; a rejected pattern is ignored with a warning,
the same safe failure mode as the existing nested-quantifier check.

Adds focused `isSafeRegex` unit tests for the new class and confirms the
existing safe cases (`(foo|bar)*`, `(ab)+`, `(?:/.*)?`, bounded/optional
repetition) still pass.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor
Example Preview Production Original
app-router-cloudflare preview production
pages-router-cloudflare preview production
app-router-playground preview production original
realworld-api-rest preview production
nextra-docs-template preview production
benchmarks preview production
hackernews preview production original
workers-cache preview production
web preview production

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: ReDoS overlapping-alternation guard

The change is well-scoped, the brace-quantifier parsing (isUnboundedQuantifierAt), the ?: unwrapping, the character-class/escape handling in the scanners, and the conservative prefix-overlap heuristic all look correct, and the precision claims hold up (disjoint alternations like (foo|bar)*, bounded (a|a){2,5}, and (a|a)? stay safe). Nice tests.

I did find one substantive gap that lets the exact exponential class this PR sets out to close slip through.

Bypass: overlapping alternation wrapped in an extra group

hasAmbiguousUnboundedAlternation only inspects the top-level alternation of the body of the group that the unbounded quantifier is directly attached to. If the ambiguous alternation sits one grouping level deeper, splitTopLevelAlternatives never sees the overlapping branches as siblings, so the pattern is rated SAFE — yet it backtracks exponentially exactly like (a|a)*:

((a|a))*      isSafeRegex => true   (~6.6 s on "a"×26 + "!")
((a|a)|x)*    isSafeRegex => true   (~7.2 s)
(x|(a|a))*    isSafeRegex => true   (~6.9 s)

Measured locally with new RegExp("^" + p + "$").test("a".repeat(26) + "!"). Each added char roughly doubles the time, same as the cases the PR already rejects.

Interestingly (?:(a|a))* is caught, only because the ?: is unwrapped and the inner (a|a) then... actually it's caught for an unrelated reason (the inner group's own )*-less close), which makes the inconsistency more confusing: (?:(a|a))* is rejected but the equivalent ((a|a))* is not.

Root cause: the scan treats the directly-quantified group's body as a flat string and only splits its depth-0 |. A nested group whose own body is an overlapping alternation isn't examined. A targeted fix would be: when a group is repeated by an unbounded quantifier, also recurse into any single child group that spans (effectively) the whole body (i.e. ( <innerGroup> )* where the inner group is the sole content, ignoring a trailing top-level |x branch is harder — at minimum handle the sole-child-group case ((a|a))*).

I'd consider this worth addressing before merge, since the PR's stated goal is to close this exponential class and ((a|a))* is a trivial transformation of the headline (a|a)* example. If it's deliberately deferred, the PR/commit body should call out the wrapped-group variant as still-unguarded (the "best-effort" note currently only mentions full ambiguity analysis being out of scope).

Minor / acceptable as best-effort (no change requested)

  • (.|a)* is rated safe even though . overlaps everything. This is an inherent limitation of textual prefix-matching (a metacharacter . is treated as the literal token .), and is consistent with the "best-effort, no full ambiguity analysis" framing. Fine to leave.
  • branches.length >= 2 is computed on the unfiltered split while hasOverlappingAlternatives filters empties — I checked (a||a)*, (|a|a)*, (a|a|)* and all are correctly rejected, so there's no empty-branch hole here.

Everything else LGTM. The 23 isSafeRegex tests pass locally.

@ask-bonk

ask-bonk Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Posted the review to PR #1851.

Summary

The PR is well-constructed — the brace-quantifier parsing, ?: unwrapping, character-class/escape handling, and the conservative prefix-overlap heuristic are all correct, and the precision claims hold (disjoint and bounded alternations stay safe). The 23 isSafeRegex tests pass.

One substantive issue found: the guard can be bypassed by wrapping the overlapping alternation in an extra group. hasAmbiguousUnboundedAlternation only inspects the top-level alternation of the directly-quantified group's body, so a nested variant slips through while remaining exponential:

  • ((a|a))* → rated SAFE, ~6.6 s on a 26-char input
  • ((a|a)|x)* → SAFE, ~7.2 s
  • (x|(a|a))* → SAFE, ~6.9 s

These are trivial transformations of the PR's headline (a|a)* example and belong to the exact class the PR aims to close. Confusingly, (?:(a|a))* is caught (for an incidental reason), making the gap inconsistent. I recommended either handling the sole-child-group case or explicitly documenting the wrapped-group variant as still-unguarded.

I also noted two non-blocking items: (.|a)* is unguarded (acceptable best-effort limitation since . is treated as a literal token), and confirmed the empty-branch interplay ((a||a)* etc.) is correctly handled.

github run

@james-elicx james-elicx marked this pull request as draft June 8, 2026 18:52
The overlapping-alternation detector only split the top-level alternation
of a directly-quantified group's body, so wrapping the ambiguous group in
a redundant capturing or non-capturing group bypassed it while staying
exponential:

  - ((a|a))*   — body is the sole child group (a|a)
  - ((a|a)|x)* — branch (a|a) is a nested overlapping group
  - (x|(a|a))* — branch (a|a) is a nested overlapping group

Analyse the quantified body recursively: split into top-level alternatives,
then peel any branch that is wholly a single group (unwrapping the parens and
normalising a ?: prefix) and recurse. A depth cap bounds work on adversarial
nesting. Disjoint/bounded alternations ((ab|cd)*, ((ab|cd))*) recurse to
non-overlapping branches and stay safe — no new false positives.
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@pkg-pr-new

pkg-pr-new Bot commented Jun 8, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@vinext/cloudflare@1851
npm i https://pkg.pr.new/vinext@1851

commit: 76830c1

@ask-bonk

ask-bonk Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@james-elicx Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

2 similar comments
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: wrapped/nested overlapping-alternation follow-up

The follow-up commit (812e3ce) correctly closes the gap from the prior review. unwrapSoleGroup + bodyHasOverlappingAlternation now reject the sole-wrapper and nested-branch variants, with a depth cap that prevents scanner self-DoS:

  • ((a|a))*, (((a|a)))*, (?:(?:a|a))* → rejected ✅
  • ((a|a)|x)*, (x|(a|a))*, (x|(a|ab)|y)+ → rejected ✅
  • disjoint wrapped/nested (((ab|cd))*, (x|(ab|cd))*, ((en|fr|de))*) → stay safe ✅ (no false positives)

I verified the depth-32 cap returns false in ~16 ms on a 5000-deep (((…a|a…)))*, so the scanner itself can't be DoS'd.

One remaining gap in the same class: group concatenated inside the quantified body

bodyHasOverlappingAlternation only recurses into a branch when unwrapSoleGroup says the branch is wholly a single parenthesised group. If the overlapping group is concatenated with any other token inside the quantified body, the recursion never sees it, and the pattern is rated SAFE — yet it's still exponential, exactly like the cases this commit set out to close:

((a|a)b)*     isSafeRegex => true   — exponential
(b(a|a))*     isSafeRegex => true   — exponential
(b(a|a)c)*    isSafeRegex => true   — exponential
(((a|a)b))*   isSafeRegex => true   — exponential

Measured locally with new RegExp("^"+p+"$").test("ab".repeat(n)+"!") for ((a|a)b)*:

n=20  103 ms
n=25  413 ms
n=30  12790 ms   (each +5 chars ≈ ×30)

Root cause: for body "(a|a)b", splitTopLevelAlternatives yields the single branch (a|a)b; hasOverlappingAlternatives sees one branch (no overlap); and unwrapSoleGroup("(a|a)b") returns null because the trailing b means the body isn't wholly one group. So the inner (a|a) is never examined. The scanner descends into a sole-child group but not into a group that is one factor of a concatenation.

This is a trivial transform of the headline (a|a)* (append/prepend a literal), so it arguably belongs to the same "close the concrete, measured gap" framing as the wrapped-group fix. Two options:

  1. Address it: when scanning a quantified body, also recurse into any nested group within the concatenation (not just a sole-child group), e.g. walk the body for (...) sub-groups and check each for an overlapping alternation. The depth cap already bounds the work.
  2. Document it: if deferred, call out the "overlapping group concatenated with other tokens" variant as still-unguarded, the same way the PR body notes full ambiguity analysis is out of scope. Right now the commit message implies the wrapped/nested class is fully closed, which isn't quite accurate.

Confirmed acceptable best-effort (no change requested)

  • (.|a)* stays allowed — . is treated as a literal token by the textual prefix check. Consistent with the documented best-effort framing, same as the prior review.
  • Depth-32 recursion cap means (((…>32 wrappers…(a|a)…)))* would be allowed, but such a pattern is not plausible in real config; the cap is the right trade-off to keep the scanner linear.

Everything else LGTM — the brace-quantifier parsing, ?: unwrapping, character-class/escape handling, and overlap heuristic are all correct, and the 26 isSafeRegex tests pass locally.

@ask-bonk

ask-bonk Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Posted the review to PR #1851.

Summary

The follow-up commit (812e3ce) correctly closes the wrapped/nested gap that the prior review flagged. I verified:

  • Fixed & tested: ((a|a))*, (((a|a)))*, (?:(?:a|a))*, ((a|a)|x)*, (x|(a|a))* are now rejected; disjoint wrapped/nested alternations (((ab|cd))*, ((en|fr|de))*) stay safe (no false positives).
  • No scanner self-DoS: the depth-32 cap returns false in ~16 ms on a 5000-deep nested wrapper.
  • 26 isSafeRegex tests pass.

One remaining gap in the same exponential class (the substantive finding): when the overlapping group is concatenated with another token inside the quantified body, it slips through and is still exponential:

  • ((a|a)b)*, (b(a|a))*, (b(a|a)c)*, (((a|a)b))* → all rated SAFE, all exponential (((a|a)b)* measured 103 ms → 413 ms → 12.8 s at n=20/25/30).

Root cause: unwrapSoleGroup only recurses when a branch is wholly a single group, so (a|a)b (group + trailing literal) is never descended into. I recommended either recursing into any nested sub-group within the quantified body's concatenation, or explicitly documenting this "concatenated group" variant as still-unguarded (the commit currently implies the wrapped/nested class is fully closed).

I confirmed (.|a)* and the depth-32 cap are acceptable best-effort limitations, consistent with the PR's framing.

github run

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

The wrapped/nested ReDoS detector only recursed into branches that were
wholly a single group, so an overlapping alternation concatenated with
another token inside the quantified body slipped through while staying
exponential: ((a|a)b)*, (b(a|a))*, (b(a|a)c)* all backtrack
exponentially because the inner overlapping group is still repeated by
the outer unbounded quantifier.

Generalise the recursion: descend into every top-level sub-group of each
branch (not just sole groups). topLevelGroupBodies replaces
unwrapSoleGroup and returns a sole-group branch's body just as readily as
a concatenated one. Precision is preserved — only an overlapping inner
alternation trips the guard, so disjoint config shapes like ((en|fr|de)x)*
and (/(en|fr)/foo)* stay safe.
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

2 similar comments
@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@james-elicx

Copy link
Copy Markdown
Member Author

/bigbonk review for issues

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: concatenated-group overlapping-alternation follow-up

The follow-up commit (76830c1) correctly closes the concatenated-group gap from the prior review. The new topLevelGroupBodies + recursive bodyHasOverlappingAlternation now descend into any top-level sub-group of each quantified-body branch, not just a sole-child group. Verified locally:

  • Fixed & tested: ((a|a)b)*, (b(a|a))*, (b(a|a)c)*, (((a|a)b))*, ((a|a)b|x)*, (x|(a|a)b)*, (b(c(a|a)d)e)* are now rejected.
  • No false positives: disjoint concatenated/nested groups stay safe — ((en|fr|de)x)*, (/(en|fr)/foo)*, (x(ab|cd)y)*, ((foo|bar)/(baz|qux))+, ((ab|cd))*, (x|(ab|cd))*. I diffed isSafeRegex between ee8e6d9 and 76830c1: every pattern that flipped to false (((ab|abc))*, ((a|ab)x)+, (/(a|a)/)*, (foo(en|fr|en))*, (x|(a|ab)|y)+) contains a genuinely overlapping inner alternation — no disjoint pattern was over-rejected.
  • No scanner self-DoS: the depth-32 cap returns in ~17 ms on a 5000-deep (((…a|a…)))*, and a 2000-group wide concatenation scans in ~1.5 ms.
  • 28 isSafeRegex tests pass (vp test run tests/shims.test.ts -t "isSafeRegex"), vp check is clean on both changed files, and matchPattern + routing tests pass.

The two prior bypasses (wrapped/nested and concatenated-group) are both resolved. Nice work — the recursion is the right generalization and it stays precise.

One remaining gap in the same exponential class: character-class / metachar branch overlapping a literal

hasOverlappingAlternatives compares branches textually (one branch must be a string prefix of another). A branch that is a character class or metacharacter matching the same input as a literal branch is not a textual prefix, so it's rated SAFE — yet it's just as exponential as (a|a)*:

(\w|a)*     isSafeRegex => true   — 26925 ms on "a"×28 + "!"   (exponential)
(a|[a])*    isSafeRegex => true   — 20571 ms on "a"×28 + "!"
([a-z]|a)*  isSafeRegex => true   — 20941 ms on "a"×28 + "!"
(\d|0)*     isSafeRegex => true   — 20087 ms on "0"×28 + "x"

This is the same root cause as the (.|a)* case the prior reviews already noted as an acceptable best-effort limitation — a metacharacter/class token (., \w, \d, [a], [a-z]) overlaps a literal at match time but not at text time. So this finding doesn't break the PR's framing; it just broadens the documented limitation from (.|a)* to "any character-class or metacharacter branch that overlaps a literal branch." Two options, matching the precedent set in the prior reviews:

  1. Address it (broader scope): expand the overlap check to recognize when one branch is a single-token class/metachar that subsumes another branch's leading token (\wa, \d0, [a-z]a, .⊇anything). This is a meaningful expansion of the heuristic and arguably belongs in a follow-up rather than this PR.
  2. Document it (consistent with current best-effort framing): update the limitation note to say character-class/metachar-vs-literal overlap ((.|a)*, (\w|a)*, (a|[a])*) is still unguarded, not just the (.|a)* example. The PR body's "best-effort, full ambiguity analysis out of scope" note already covers this in spirit; making the class explicit avoids implying it's fully closed.

I'd lean toward (2) for this PR — the concatenated-group fix is the stated goal here and it's solid; the class/metachar overlap is a distinct, broader heuristic improvement that's fine to defer as long as it's named explicitly so a developer copying (\w|a)* into a has rule knows the guard won't catch it.

Confirmed acceptable best-effort (no change requested)

  • Depth-32 recursion cap means a pathologically deep wrapper (>32 levels) would be allowed, but that's not plausible in real config and the cap is the right trade-off to keep the scanner linear.
  • The prefix-overlap class ((a|ab)+, (ab|abc)+) is rejected but, on V8/Irregexp, isn't actually exponential for these — only exact-duplicate ((a|a)*) is. That's a conservative over-rejection inherited from the original PR (unchanged by this commit) and is the safe direction.

Everything else LGTM.

@ask-bonk

ask-bonk Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review posted to PR #1851.

Summary

The follow-up commit (76830c1) correctly closes the concatenated-group ReDoS gap that the prior review flagged. I verified the fix end-to-end:

  • Fixed: ((a|a)b)*, (b(a|a))*, (b(a|a)c)*, (((a|a)b))*, and deeper variants are now rejected via the new topLevelGroupBodies + recursive bodyHasOverlappingAlternation.
  • No false positives: I diffed isSafeRegex between the first and latest PR commits — every pattern that flipped to unsafe contains a genuinely overlapping inner alternation; no disjoint config pattern was over-rejected.
  • No scanner self-DoS: depth-32 cap returns in ~17ms even on 5000-deep nesting.
  • Tests/checks pass: 28 isSafeRegex tests, matchPattern, routing, and vp check all green.

One substantive finding (same exponential class, distinct root cause): the overlap check is purely textual (string-prefix), so a character-class or metacharacter branch overlapping a literal is rated SAFE yet is exponential:

  • (\w|a)* → 26.9s, (a|[a])* → 20.6s, ([a-z]|a)* → 20.9s, (\d|0)* → 20.1s

This is the same root cause as the (.|a)* case the prior reviews already accepted as a best-effort limitation — it just broadens that class. I recommended either expanding the heuristic to recognize class/metachar subsumption (better as a follow-up) or, preferably for this PR, explicitly documenting the broadened limitation so it's not implied to be fully closed.

I also confirmed two non-issues: the depth-32 cap is an acceptable trade-off, and the prefix-overlap rejection ((a|ab)+) is a conservative over-rejection inherited unchanged from the original PR (V8 doesn't actually backtrack exponentially on it).

github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant