fix(appsec): Implement limits for analyzed downstream requests#8655
fix(appsec): Implement limits for analyzed downstream requests#8655uurien wants to merge 9 commits into
Conversation
Overall package sizeSelf size: 6.07 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | dc-polyfill | 0.1.11 | 25.74 kB | 25.74 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 1437d20 | Docs | Datadog PR Page | Give us feedback! |
BenchmarksBenchmark execution time: 2026-06-02 08:19:58 Comparing candidate commit 1437d20 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1504 metrics, 89 unstable metrics. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37afb8fceb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Always run handleRedirectResponse before applying body guard outcomes so sampled redirect chains still store the Location decision. Skip response_body_ignored metrics on redirect hops where body analysis is deferred to the follow-up request. Co-authored-by: Cursor <cursoragent@cursor.com>
| "implementation": "A", | ||
| "type": "int", | ||
| "internalPropertyName": "appsec.apiSecurity.maxDownstreamBodyBytes", | ||
| "default": "10485760" |
There was a problem hiding this comment.
Do we have to make this configurable? Would a default not be fine?
There was a problem hiding this comment.
I think is good to have this configurable (I know that 99% of users is not going to modify it), but if some user has a memory sensitive application or they complain that we are ignoring data, we can say them to modify the max size.
| */ | ||
| function getResponseBodyCollectionConfig () { | ||
| return { | ||
| maxBytes: config?.appsec?.apiSecurity?.maxDownstreamBodyBytes ?? DEFAULT_MAX_DOWNSTREAM_BODY_BYTES, |
There was a problem hiding this comment.
| maxBytes: config?.appsec?.apiSecurity?.maxDownstreamBodyBytes ?? DEFAULT_MAX_DOWNSTREAM_BODY_BYTES, | |
| maxBytes: config.appsec.apiSecurity.maxDownstreamBodyBytes |
If we access the config, it is guaranteed to already have done all of that
| return { collect: false, reason: 'content_length_too_big' } | ||
| } | ||
|
|
||
| return { collect: true } |
There was a problem hiding this comment.
I am not sure but I believe V8 would be able to detect the shape better if we define an object at the top where we just change the value before returning. I would add a undefined reason so the shape stays constant.
|
|
||
| // Track body analysis count if we're sampling the response body | ||
| if (shouldCollectBody) { | ||
| incrementBodyAnalysisCount(req) |
There was a problem hiding this comment.
Should this counter be incremented if response body is finally ignored (not analyzed)?
| getMethod, | ||
| storeRedirectBodyCollectionDecision, | ||
| SUPPORTED_RESPONSE_BODY_MIME_TYPES, | ||
| RESPONSE_BODY_IGNORED_METRIC_SUFFIX, |
There was a problem hiding this comment.
RESPONSE_BODY_IGNORED_METRIC_SUFFIX exported but never used...
There was a problem hiding this comment.
if it is not used elsewhere, only here to filter out unknown reasons, is there really a reason for it to be an object? Could it not be a Set and check it with VALID_REASONS.has(xxxx)?
There was a problem hiding this comment.
I'm going to remove the export, you're right it is not used (I guess it was used in a initial version)
Is very small object (3 keys), why do you thing that set would be better than object? I'd be slower (I know, not much), isn't it?
| function handleResponseFinish ({ ctx, res, body, responseBodyIgnoredReason }) { | ||
| // downstream response object | ||
| if (!res) return | ||
|
|
||
| const originatingRequest = getActiveRequest() | ||
| if (!originatingRequest) return | ||
|
|
||
| // Skip body analysis for redirect responses | ||
| const evaluateBody = ctx.shouldCollectBody && !downstream.handleRedirectResponse(originatingRequest, res) | ||
| const shouldCollectBodyAfterRedirect = ctx.shouldCollectBody && | ||
| downstream.handleRedirectResponse(originatingRequest, res) | ||
|
|
||
| if (!shouldCollectBodyAfterRedirect && responseBodyIgnoredReason) { | ||
| downstream.recordResponseBodyIgnored(originatingRequest, responseBodyIgnoredReason) | ||
| } | ||
|
|
||
| const evaluateBody = ctx.shouldCollectBody && !shouldCollectBodyAfterRedirect && !responseBodyIgnoredReason | ||
| const responseBody = evaluateBody ? body : null | ||
| runResponseEvaluation(res, originatingRequest, responseBody) | ||
| } |
There was a problem hiding this comment.
I think these logical paths could be reworked (within the naming) to make them clear.
if !ctx.shouldCollectBody we could return an early runResponseEvaluation with a null responseBody
the same applies to shouldCollectBodyAfterRedirect being true (which in this case, we can rename to isRedirect to make it clear).
then, if there is a responseBodyIgnoredReason we can record the metric and return early again.
and finally, if none of this conditions are met, we runResponseEvaluation with the body.
WDYT?
| }) | ||
|
|
||
| it('ignores body when content-length exceeds maxBytes', (done) => { | ||
| responseBodyCollection.maxBytes = 10 |
There was a problem hiding this comment.
responseBodyCollection is declared in a before block and then mutated in this test. would worth to move the responseBodyCollection declaration to beforeEach to be sure it is not mutated in each test.
| sinon.assert.calledTwice(span.setTag) | ||
| sinon.assert.calledWith(span.setTag, tag, 1) | ||
| sinon.assert.calledWith(span.setTag, tag, 2) | ||
| webRootStub.restore() |
There was a problem hiding this comment.
Is not sinon.restore() from afterEach doing the same?
What does this PR do?
Adds limits for downstream HTTP response body collection in the AppSec SSRF path:
DD_API_SECURITY_MAX_DOWNSTREAM_BODY_BYTES(default 10 MB).http/client.jsbefore buffering the response body:Content-Typeonly (application/json,text/json,application/x-www-form-urlencoded).Content-Lengthrequired and non-zero.Content-Lengthmust not exceedmaxBytes.responseBodyIgnoredReasononapm:http:client:response:finishand skips body collection.Motivation
Without limits, a sampled downstream response could buffer unbounded data in memory. This change validates headers before collection and caps the maximum analyzable body size via configuration.
Additional Notes
Content-Length.content_type_invalid,content_length_missing,content_length_too_big.http.spec.jsuse a local HTTP server for deterministic headers and body; generic finish-channel tests still hit datadoghq.System tests
APPSEC-62792