🤖 perf: bulk launch workflow agent tasks#3494
Conversation
Implement two-phase bulk task reservation and launch for parallel workflow agents, including starting-state persistence, scheduler coalescing, workflow bulk creation, task tool schema updates, and regression coverage.\n\n---\n\n_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `719969{MUX_COSTS_USD:-unknown}`_\n\n<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=105.86 -->
|
@codex review |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review Pushed test-only hardening for the latest merge-queue unit timeout:
Validation:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d1dbeb03f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed Codex finding: Preserve legacy queued-task resume path. Response:
Validation:
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d659ba48c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed Codex finding: Avoid replaying accepted starting prompts on restart. Response:
Validation:
|
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Implements bulk reservation and hardened parallel launch handling for workflow-spawned agent tasks so
parallelAgents([...])can reserve child tasks in one batch, start admitted children outside the global task mutex, and keep queued/starting lifecycle state visible across backend and UI consumers.Background
Workflow fan-out previously went through per-child task creation semantics that serialized expensive workspace startup under coarse task locking. The first implementation introduced
startingreservations and workflow bulk creation; this follow-up hardens the transient lifecycle against launch failures, restarts, termination races, scheduler starvation, and browser status gaps.Implementation
interruptedwithtaskLaunchErrorwhile separately cleaning materialized workspace/session state.startingis treated as an active pre-stream state and generic sends remain disabled while startup is in progress.Validation
bun test src/node/services/taskService.test.ts --filter "createMany|queued|starting|launch"bun test src/node/services/workflows/WorkflowRunner.test.ts --filter "bulk"bun test src/node/services/workflows/WorkflowTaskServiceAdapter.test.tsbun test src/browser/utils/ui/workspaceFiltering.test.ts src/browser/components/ProjectSidebar/ProjectSidebar.test.tsx src/node/services/tools/task.test.ts src/node/services/tools/task_await.test.tsbun run typecheckmake static-checkNote: one broader
WorkflowRunner.test.ts --filter "parallelAgents|bulk|validation"run hit a Bun 1.2.15 segmentation fault after passing many tests; rerunning the focused bulk workflow-runner test passed.Risks
This touches task scheduling, workflow replay, startup cleanup, and sidebar/chat status rendering. The main risk is subtle lifecycle ordering around starting-task interruption or restart; added coverage and defensive status rechecks target those paths.
📋 Implementation Plan
Plan: bulk/parallel task launch for workflow
parallelAgentsGoal
Make workflow fan-out behave like real fan-out at launch time. A workflow that calls
parallelAgents([...20 specs])should reserve/create the 20 child task records in one backend batch, start as many as the configured global parallel limit allows, and launch those starts concurrently with bounded safety rather than serializing each child through the fullTaskService.create()workspace-fork/init/stream-start path.Verified context
parallelAgents(...)already exists andWorkflowRunner.runAgentStepsInParallel(...)starts pending steps withcurrentPending.map(...), then consumes completions withPromise.race(...)while preserving input-order results (src/node/services/workflows/WorkflowRunner.ts:1208-1368).WorkflowTaskServiceAdapter.runAgent(...)once per spec, and each call delegates totaskService.create(...)before waiting for the report (src/node/services/workflows/WorkflowTaskServiceAdapter.ts:183-219).TaskService.create(...)acquires a single globalAsyncMutexatsrc/node/services/taskService.ts:1503and holds it through validation, capacity accounting, workspace fork/create, base-SHA capture, config persistence, background init kickoff, andworkspaceService.sendMessage(...)(src/node/services/taskService.ts:1503-1884).src/node/services/taskService.ts:1900,1982,3117).ca2367a24/ “feat: sub-workspaces as subagents (🤖 feat: sub-workspaces as subagents #1219)”, whose plan emphasized restart-safe orchestration, max parallel task limits, max nesting depth, durable queued/running state, report delivery, and auto-resume. This explains the correctness-first coarse critical section.WorktreeManager.createWorkspace(...)performs expensive launch work: stale lock cleanup, branch checks, best-effort fetch,git worktree add,.muxignoresync, submodule sync, and branch mapping persistence (src/node/worktree/WorktreeManager.ts:69-190).runBackgroundInit(...)is fire-and-forget, but child stream startup waits for init inAIService.streamMessage(...)viainitStateManager.waitForInit(...)(src/node/runtime/runtimeFactory.ts:54-66,src/node/services/aiService.ts:1111-1116). HoldingTaskService.create()'s mutex untilsendMessage(...)returns therefore serializes most child startup latency.Approach options and LoC estimates
Option A — small patch: workflow calls a
createManywrapper around existingcreate()Net product LoC: ~120-180.
TaskService.create(...)and returns an array.parallelAgents(...)to call the bulk method once.Why not enough: this improves API shape but not launch throughput because every item still enters the same coarse mutex and performs the same expensive fork/init/start work serially. This is useful only as a stepping stone, not the final fix.
Option B — recommended: two-phase reservation + bounded concurrent start + workflow bulk adapter
Net product LoC: ~650-900.
TaskService.createMany(...)and makeTaskService.create(...)delegate to it for the singleton case.parallelAgents([...])can reserve all children in one backend invocation and then wait for them concurrently.Why recommended: it directly addresses the serialized launch bottleneck while preserving the existing durable queue, restart, max-parallelism, interrupt, and report-delivery invariants.
Option C — larger scheduler rewrite with runtime-specific pools
Net product LoC: ~1,100-1,600.
Why defer: this is attractive long-term, especially for SSH/Docker/Coder runtimes, but it is more architecture than needed to solve workflow fan-out. Option B should leave seams for this later.
Recommended implementation plan (Option B)
Phase 1 — Add launch-state model, UI surface, and timing evidence
Files/symbols:
src/common/orpc/schemas/workspace.tssrc/node/services/taskService.tssrc/node/services/tools/task_list.tstaskStatussrc/common/types/tasks.ts/ relevant schema tests if status helpers exist thereChanges:
AgentTaskStatusvalue:"starting"and optional launch-failure metadata.startingmeans: a slot is reserved and startup is in progress outside the global task mutex, but the stream has not necessarily registered yet.queued|running|awaiting_report|interrupted|reportedto includestarting.taskLaunchError?: string(and timestamp if useful) for startup failures that occur after reservation but before a reportable child stream. Represent those astaskStatus: "interrupted"plustaskLaunchErrorrather than adding a broader terminal status in this slice.countActiveAgentTasks(...)must countstartingas active so batch reservation cannot exceedmaxParallelAgentTasks.startingalongsidequeued,running, andawaiting_report.task_list/ status filtering should includestartingin active/default views where appropriate.startingas an active pre-running state (for example “Starting…”) rather than a terminal or idle state.startingcontinue to parse exactly as before; this is an additive status value.waitForAgentReport(...)should treat bothqueuedandstartingas “not running yet”; execution timeout should begin only after transition torunningor after a deterministic launch-failure signal is persisted.Use existing
taskQueueDebug(...)/ backendlogpatterns, not user-visible UI changes.startingtasks havetaskPromptpersisted until stream send accepts the message;running, back toqueued,interrupted, or is removed/failed with waiters rejected.Quality gate: targeted unit tests for
startingstatus accounting before changing concurrency behavior.Phase 2 — Extract reservation and launch plan from
TaskService.create()Files/symbols:
src/node/services/taskService.tsTaskCreateArgs/TaskCreateResultChanges:
Exact names can vary; keep the types private unless tests need them.
TaskService.createMany(argsList).this.mutexonce.reservedActiveCountfor eachstartingtask;taskStatus: "starting"for tasks admitted to capacity ortaskStatus: "queued"for overflow tasks;taskPromptfor bothstartingandqueueduntil the message is accepted bysendMessage(...).createMany(...)owns/schedules the unlocked startup work for admittedstartingtasks before it returns; callers receive ordered reservation results only, not launch-plan internals.TaskService.create(args)delegate to the same reservation/start machinery but preserve current singleton compatibility:{ status: "queued" }as today;startingoutside the coarse lock, but the public singletoncreate()should still return only after the child send is accepted and the observable result can be reported asrunning, or after a concrete startup error is available.startingbecause they intentionally want fast reservation and separate waiting.orchestrateFork(...), noreadTaskBaseCommitShaByProjectPath(...), no secrets resolution, norunBackgroundInit(...), and noworkspaceService.sendMessage(...)while holdingthis.mutex.void this.maybeStartQueuedTasks()sites with a smallscheduleMaybeStartQueuedTasks()helper that catches/logs failures and coalesces concurrent scheduler requests.Quality gate: tests that reserve 20 tasks with
maxParallelAgentTasks = 16and assert exactly 16starting+ 4queued, without invoking runtime fork/send in the mutex-protected phase.Phase 3 — Start reserved tasks outside the global mutex with bounded concurrency
Files/symbols:
src/node/services/taskService.tssrc/node/services/utils/forkOrchestrator.tsonly if needed for clearer launch-plan inputssrc/node/runtime/*only if tests reveal runtime-specific issuesChanges:
startReservedAgentTask(plan)method that performs the current expensive path outsidethis.mutex:startWorkspaceInit(...);orchestrateFork(...);readTaskBaseCommitShaByProjectPath(...);runBackgroundInit(...);workspaceService.sendMessage(...)with the renamed/expanded internalallowAgentTaskLaunchflag that permits scheduler-owned queued/starting launches;starting -> runningonly after send accepts the message.maxParallelAgentTasksas the hard active-task limit; the startup-concurrency cap only limits fork/init pressure.git worktree addis safe against one repo's metadata. Non-git or already-isolated runtime work can later opt into higher per-runtime concurrency after evidence.src/constants/if shared; avoid adding user-facing configuration until the behavior is proven.startingtask gets an internal launch token or monotonically checked status expectation.startReservedAgentTask(...)must re-read the workspace entry under a short lock before expensive side effects and again beforesendMessage(...).starting(for example parent hard-interrupt changed it tointerrupted), do not callsendMessage(...); cleanup or preserve any just-created workspace according to the interrupt semantics, reject waiters, free the slot, and schedule queued work.starting -> running,starting -> queued,starting -> interrupted, source runtime config updates, workspace path/runtime/base-SHA persistence, and waiter rejection bookkeeping should reacquirethis.mutexor a narrower per-task lock.maybeStartQueuedTasks()as two-phase:starting, collect launch plans, release lock;startReservedAgentTask(...)with bounded concurrency;running,queuedretry,interrupted, or removed/failed, and reject waiters on unrecoverable failure.startingslot.sendMessage(...)fails after workspace creation, reuse existingrollbackFailedTaskCreate(...)where safe, then reject waiters with the concrete error.taskStatus: "interrupted"plustaskLaunchErrorfor startup failures that should not be retried blindly;waitForAgentReport(...)must surface that error after restart instead of hanging or looping.WorkspaceService.sendMessage(...)guards:queuedandstartingtask workspaces;allowQueuedAgentTaskwith a clearer internal flag such asallowAgentTaskLaunchso the scheduler can launchstartingtasks without allowing arbitrary user sends.TaskService.initialize(), detect stalestartingtasks.running.taskPromptand no stream, safely demote toqueuedand schedule startup.interruptedand reject/record enough context for inspection.Quality gate: crash/restart-style tests for stale
startingtasks and launch failure cleanup.Phase 4 — Add workflow bulk task API and adapt
parallelAgents(...)Files/symbols:
src/node/services/workflows/WorkflowRunner.tssrc/node/services/workflows/WorkflowTaskServiceAdapter.tssrc/node/services/workflows/WorkflowTaskServiceAdapter.test.tssrc/node/services/workflows/WorkflowRunner.test.tsChanges:
WorkflowTaskAdapterwith an optional create-only bulk method, for example:WorkflowTaskServiceAdapterby callingtaskService.createMany(...)once.runAgentStepsInParallel(...)to use the bulk path when available:taskIdsteps vs. new task steps;recordStepStarted(...)and task-start events as each task ID is returned;waitForAgentTask(...)concurrently;runOrResumeAgentStep(...)restart semantics instead of bypassing them. Either feed pre-created task IDs back through that helper or factor out the wait/restart branch soTask not found/Task interruptedstill restart when appropriate.interruptRun().agent(spec)/ single-step behavior compatible by havingrunAgent(...)continue to work, possibly implemented via the same singleton bulk path.parallelAgents(...)call (or assert earlier if already guaranteed by replay IDs).Quality gate: workflow tests proving a 20-spec
parallelAgents(...)invokescreateManyonce, records task-start events for all children, returns ordered results, and starts waits concurrently.Phase 5 — Tests and regression coverage
Targeted tests:
src/node/services/taskService.test.tscreateMany reserves capacity in one locked pass.create delegates to createMany singleton and preserves result shape.starting tasks count against maxParallelAgentTasks.maybeStartQueuedTasks marks queued tasks starting before unlocked launch.taskLaunchError, rejects waiters, and schedules queued work.startingdescendants like active descendants.starting, block fork/create, hard-interrupt parent, unblock fork/create, assert nosendMessage(...), any created workspace is cleaned/preserved according to interrupt policy, waiters are rejected, and the slot is freed.startingtoqueuedorrunningbased on stream state.src/node/services/workflows/WorkflowTaskServiceAdapter.test.tstaskService.createManyonce with workflow metadata on every item.src/node/services/workflows/WorkflowRunner.test.tsparallelAgentsuses bulk creation when adapter supports it.taskStatus: "starting"andtaskLaunchError, including task-list active filters and display labels where applicable.Validation commands:
Use
run_and_reportif batching validation commands in one shell invocation.Acceptance criteria
parallelAgents([...20 specs])performs one workflow-adapter bulk create call for the new child tasks, not 20 independentTaskService.create(...)calls.createMany(...)reservation is all-or-none for pre-launch validation/config failures: invalid input persists no partial batch state.TaskService.create(...)preserves existing public behavior: returnsqueuedimmediately when capacity is unavailable, or returnsrunningonly after startup send is accepted when capacity is available.maxParallelAgentTasks = 16and no other active tasks, a 20-child workflow reserves exactly 16 active starts and 4 queued tasks.startingtasks after restart, no ghost active slots after launch failure, and persistedtaskLaunchErroris surfaced to awaiters after restart.tasktool behavior remains compatible: if a task is queued or starting, foreground waits block until the child reports or fails; execution timeout starts only once the child is actually running.Task not found/Task interruptedrestart behavior remains intact.Dogfooding plan
Use the requested repo skills:
dev-server-sandboxfor an isolated Mux dev server with its own temporaryMUX_ROOTand free backend/Vite ports.agent-browserfor UI interaction, screenshots, console/error checks, and video recording.dogfooddiscipline: capture evidence as the workflow is exercised, not afterward.Dogfood setup
KEEP_SANDBOX=1 make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects"If API/provider config is needed for real child agents, omit
--clean-providersso the sandbox copies provider config from the seed Mux root. Do not copy secrets manually.explorechildren:agent-browserusing the Vite URL printed bymake dev-server-sandbox:Dogfood execution and evidence
In the UI:
Capture evidence at key points:
Backend/log checks:
maxParallelAgentTasks.Attach the video and key screenshots to the implementation summary so reviewers can verify the dogfood path.
Risks and mitigations
TaskService.create(...): callers may currently expect synchronous fork/start failures. Preserve singletoncreate()semantics; expose faststartingreservation only to bulk/workflow paths that explicitly use it.startingintroduces a new transient persisted state. Mitigate with explicit startup self-healing tests.startingtasks beforesendMessage(...)accepts the prompt. KeeptaskPromptuntil accepted and ensure waiters are rejected after persisted interruption.Reviewer focus
maxParallelAgentTasksinvariant under concurrent bulk and non-bulk creates?startingthe right persisted state, or should reservation use an existing state plus extra metadata?WorkflowTaskAdapter.createAgentTasks(...)shape the smallest interface that givesparallelAgents(...)one backend create invocation while preserving workflow replay/retry semantics?TaskService.create(...)compatibility andcreateMany(...)all-or-none reservation semantics clear enough for implementation?Generated with
mux• Model:openai:gpt-5.5• Thinking:xhigh• Cost:923094{MUX_COSTS_USD:-unknown}