Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core (RequestProvider.addRequestsBatched / maxNewRequests)
Issue description
addRequestsBatched(..., { maxNewRequests }) continues consuming async iterables after the budget is already exhausted.
I verified this locally on a current master checkout after #3531.
The behavior seems to come from draining the remaining source iterator into requestsOverLimit, but with async/generator inputs that turns a bounded enqueue operation into a potentially blocking one.
Code sample
import { RequestQueue } from 'crawlee';
const queue = await RequestQueue.open();
let consumed = 0;
let releaseBlockedRequest = () => {};
const blockedRequest = new Promise<void>((resolve) => {
releaseBlockedRequest = resolve;
});
async function* requests() {
consumed += 1;
yield { url: 'https://example.com/1' };
consumed += 1;
await blockedRequest;
yield { url: 'https://example.com/2' };
}
const pendingResult = queue.addRequestsBatched(requests(), {
maxNewRequests: 1,
waitBetweenBatchesMillis: 0,
});
const raced = await Promise.race([
pendingResult.then(() => 'resolved'),
new Promise<'timeout'>((resolve) => setTimeout(() => resolve('timeout'), 100)),
]);
console.log({ raced, consumed });
releaseBlockedRequest();
await pendingResult;
Observed behavior
The call does not resolve within the timeout window, and the source iterator is consumed beyond the exhausted budget.
In my local repro, the output was:
{
"raced": "timeout",
"consumed": 2
}
Expected behavior
Once maxNewRequests has already been satisfied, the operation should be able to return without continuing to pull a stalled async iterator.
Package version
- not present in latest stable
v3.16.0
- reproduced on a current
master checkout on 2026-04-16
Node.js version
v20.19.6
Operating system
macOS (Apple Silicon)
Apify platform
I have tested this on the next release
Yes — reproduced on a current master checkout.
Other context
This appears to be a regression after #3531.
I opened #3579 with a narrow fix proposal that keeps exact requestsOverLimit reporting for materialized inputs (and for callers that explicitly opt into waitForAllRequestsToBeAdded), while letting the default async-iterable path return once the budget is exhausted.
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core (
RequestProvider.addRequestsBatched/maxNewRequests)Issue description
addRequestsBatched(..., { maxNewRequests })continues consuming async iterables after the budget is already exhausted.I verified this locally on a current
mastercheckout after #3531.The behavior seems to come from draining the remaining source iterator into
requestsOverLimit, but with async/generator inputs that turns a bounded enqueue operation into a potentially blocking one.Code sample
Observed behavior
The call does not resolve within the timeout window, and the source iterator is consumed beyond the exhausted budget.
In my local repro, the output was:
{ "raced": "timeout", "consumed": 2 }Expected behavior
Once
maxNewRequestshas already been satisfied, the operation should be able to return without continuing to pull a stalled async iterator.Package version
v3.16.0mastercheckout on 2026-04-16Node.js version
v20.19.6
Operating system
macOS (Apple Silicon)
Apify platform
I have tested this on the
nextreleaseYes — reproduced on a current
mastercheckout.Other context
This appears to be a regression after #3531.
I opened #3579 with a narrow fix proposal that keeps exact
requestsOverLimitreporting for materialized inputs (and for callers that explicitly opt intowaitForAllRequestsToBeAdded), while letting the default async-iterable path return once the budget is exhausted.