Skip to content

fix(http): preserve raw URL bytes via opt-in url_raw flag#1533

Open
juandiego-bmu wants to merge 2 commits intoOWASP:masterfrom
juandiego-bmu:fix/url-raw-traversal
Open

fix(http): preserve raw URL bytes via opt-in url_raw flag#1533
juandiego-bmu wants to merge 2 commits intoOWASP:masterfrom
juandiego-bmu:fix/url-raw-traversal

Conversation

@juandiego-bmu
Copy link
Copy Markdown
Contributor

What this changes

Adds an opt-in url_raw: true step-level flag in HTTP modules. When set, the URL is wrapped with yarl.URL(url, encoded=True) before reaching aiohttp, so percent-encoded and literal dot-segments survive intact onto the wire.

Closes #1532 (URL normalization breaks path-traversal modules).

Why

aiohttp passes string URLs through yarl.URL(str), which decodes %2e to . and then collapses .. segments. Modules that put traversal in the URL path therefore send a flattened path to the target, and the bypass condition the module is checking for never triggers. Full reproduction and affected-module table in the linked issue.

The change

Two-line patch in nettacker/core/lib/http.py::send_request:

async def send_request(request_options, method):
    if request_options.pop("url_raw", False):
        request_options["url"] = URL(request_options["url"], encoded=True)
    async with aiohttp.ClientSession() as session:
        ...

Default behavior is unchanged: a module without url_raw follows the existing normalization path. The flag is the opt-in.

apache_cve_2021_41773.yaml is updated to set url_raw: true as a regression case. With the flag in place, a request like:

cgi-bin/.%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd

leaves the client as GET /cgi-bin/.%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd HTTP/1.1 instead of GET /etc/passwd HTTP/1.1.

Verification

End-to-end test against a local TCP echo server, on Nettacker's pinned aiohttp 3.13.5 / yarl 1.23.0:

Scenario On-wire request line
url_raw not set (default) GET /etc/passwd HTTP/1.1
url_raw: true GET /cgi-bin/.%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd HTTP/1.1
Patched apache module replayed GET /cgi-bin/.%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd HTTP/1.1

What this does not do

  • Does not change behavior for any module that doesn't opt in.
  • Does not patch the other affected modules. Each one will need its own follow-up PR after individual review (different CVEs, different reviewers, smaller diffs to read). I have a TeamCity CVE-2024-27199 module ready that depends on this flag and will follow as a separate PR once this lands.
  • Does not add yarl to pyproject.toml as a direct dep. It is already pulled transitively by aiohttp, so the import works today; happy to add it explicitly if reviewers prefer.

aiohttp passes string URLs through yarl.URL(str), which decodes percent-
encoded dots and collapses dot-segments before the bytes hit the wire.
Modules that rely on traversal in the URL path therefore send a flattened
path to the target and never trigger the bypass condition they describe.

Adds an opt-in step-level flag url_raw: true that wraps the URL with
yarl.URL(url, encoded=True). Default behavior is unchanged. Updates
apache_cve_2021_41773.yaml with the flag as a regression case.

See linked issue for the full reproduction and the list of affected
modules.
Copilot AI review requested due to automatic review settings April 26, 2026 12:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7617adca-629d-4925-9673-5a3c8fa8fcef

📥 Commits

Reviewing files that changed from the base of the PR and between 6015314 and 2cc2b2c.

📒 Files selected for processing (1)
  • tests/test_yaml_schema_and_regex.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_yaml_schema_and_regex.py

Summary by CodeRabbit

  • Enhancements

    • Optional raw URL encoding support for HTTP requests to better handle complex or pre-encoded target URLs.
    • Apache CVE-2021-41773 detection updated to use raw URL handling for more accurate testing.
  • Tests

    • Validation updated so HTTP payloads can include the new url_raw option and pass schema checks.

Walkthrough

send_request now consumes an optional url_raw flag and, when present, replaces the outgoing url value with a yarl.URL(..., encoded=True) instance to preserve encoded path segments. A traversal module YAML is updated to set url_raw: true, and the HTTP schema test is extended to allow the new field.

Changes

Cohort / File(s) Summary
HTTP request URL handling
nettacker/core/lib/http.py
send_request reads and consumes an optional url_raw flag; if set, it replaces request_options["url"] with yarl.URL(..., encoded=True) to avoid yarl/aiohttp path normalization before sending.
Module payload update
nettacker/modules/vuln/apache_cve_2021_41773.yaml
Enabled url_raw: true in the module's HTTP payload to preserve encoded traversal segments on the wire.
YAML schema test
tests/test_yaml_schema_and_regex.py
Extended the HTTP step schema in the test to accept an optional boolean url_raw field so payloads with that key validate.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding an opt-in url_raw flag to preserve raw URL bytes in HTTP requests.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, solution, and verification.
Linked Issues check ✅ Passed The PR implements the exact solution proposed in #1532: a two-line conditional in send_request wrapping URLs with yarl.URL(url, encoded=True), applies url_raw: true to apache_cve_2021_41773.yaml as a regression case, and updates the YAML schema validation.
Out of Scope Changes check ✅ Passed All changes (http.py conditional, YAML schema update, and apache_cve_2021_41773.yaml regression case) directly address the requirements in #1532 and are within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in url_raw: true flag for HTTP module steps to preserve percent-encoded and dot-segment URL bytes on the wire (avoiding aiohttp/yarl normalization), and updates the Apache CVE-2021-41773 module to use it as a regression case.

Changes:

  • Import yarl.URL and wrap request URLs with URL(..., encoded=True) when url_raw is enabled.
  • Add url_raw: true to apache_cve_2021_41773.yaml step configuration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
nettacker/core/lib/http.py Adds url_raw handling to optionally preserve raw URL bytes by constructing an encoded yarl.URL before calling aiohttp.
nettacker/modules/vuln/apache_cve_2021_41773.yaml Opts the module into the new raw-URL behavior to prevent traversal payload normalization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to 40
if request_options.pop("url_raw", False):
request_options["url"] = URL(request_options["url"], encoded=True)
async with aiohttp.ClientSession() as session:
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url_raw is popped from request_options before the URL(..., encoded=True) conversion runs. If the conversion raises (e.g., due to invalid percent-escapes or other characters that are only accepted in the non-encoded constructor), the caller’s retry loop will run again with url_raw already removed and will silently fall back to the default URL normalization path, defeating the opt-in behavior. Consider reading the flag with get() and only pop()-ing it after a successful conversion (or build a shallow copy of request_options for the aiohttp call and leave the original dict untouched).

Copilot uses AI. Check for mistakes.
The schema validator in test_yaml_schema_and_regex.py rejects unknown
keys. Add url_raw as Optional(bool) so modules that opt in to raw URL
preservation pass schema validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP modules with .. in the URL path are silently broken (aiohttp/yarl normalize before sending)

2 participants