Skip to content

🤖 perf: bulk launch workflow agent tasks#3494

Open
ThomasK33 wants to merge 5 commits into
mainfrom
workflow-tasks-j0f2
Open

🤖 perf: bulk launch workflow agent tasks#3494
ThomasK33 wants to merge 5 commits into
mainfrom
workflow-tasks-j0f2

Conversation

@ThomasK33

Copy link
Copy Markdown
Member

Summary

Implements bulk reservation and hardened parallel launch handling for workflow-spawned agent tasks so parallelAgents([...]) can reserve child tasks in one batch, start admitted children outside the global task mutex, and keep queued/starting lifecycle state visible across backend and UI consumers.

Background

Workflow fan-out previously went through per-child task creation semantics that serialized expensive workspace startup under coarse task locking. The first implementation introduced starting reservations and workflow bulk creation; this follow-up hardens the transient lifecycle against launch failures, restarts, termination races, scheduler starvation, and browser status gaps.

Implementation

  • Added pre-persist task reservation callbacks for workflow bulk creation so workflow step checkpoints are written before task launches can be scheduled.
  • Preserved returned task metadata on reserved launch failure by marking tasks interrupted with taskLaunchError while separately cleaning materialized workspace/session state.
  • Made reserved launch recovery idempotent by reusing already-materialized checkout paths, serializing only the git fork phase per project, and rechecking task status before send.
  • Hardened queued scheduling with rerun coalescing, dynamic queue scanning, per-plan launch failure handling, and prompt-less queued task failure instead of queue starvation.
  • Updated browser/task-tool consumers so starting is treated as an active pre-stream state and generic sends remain disabled while startup is in progress.

Validation

  • bun test src/node/services/taskService.test.ts --filter "createMany|queued|starting|launch"
  • bun test src/node/services/workflows/WorkflowRunner.test.ts --filter "bulk"
  • bun test src/node/services/workflows/WorkflowTaskServiceAdapter.test.ts
  • bun test src/browser/utils/ui/workspaceFiltering.test.ts src/browser/components/ProjectSidebar/ProjectSidebar.test.tsx src/node/services/tools/task.test.ts src/node/services/tools/task_await.test.ts
  • bun run typecheck
  • make static-check

Note: one broader WorkflowRunner.test.ts --filter "parallelAgents|bulk|validation" run hit a Bun 1.2.15 segmentation fault after passing many tests; rerunning the focused bulk workflow-runner test passed.

Risks

This touches task scheduling, workflow replay, startup cleanup, and sidebar/chat status rendering. The main risk is subtle lifecycle ordering around starting-task interruption or restart; added coverage and defensive status rechecks target those paths.


📋 Implementation Plan

Plan: bulk/parallel task launch for workflow parallelAgents

Goal

Make workflow fan-out behave like real fan-out at launch time. A workflow that calls parallelAgents([...20 specs]) should reserve/create the 20 child task records in one backend batch, start as many as the configured global parallel limit allows, and launch those starts concurrently with bounded safety rather than serializing each child through the full TaskService.create() workspace-fork/init/stream-start path.

Verified context

  • parallelAgents(...) already exists and WorkflowRunner.runAgentStepsInParallel(...) starts pending steps with currentPending.map(...), then consumes completions with Promise.race(...) while preserving input-order results (src/node/services/workflows/WorkflowRunner.ts:1208-1368).
  • Workflow-owned task launch currently still calls WorkflowTaskServiceAdapter.runAgent(...) once per spec, and each call delegates to taskService.create(...) before waiting for the report (src/node/services/workflows/WorkflowTaskServiceAdapter.ts:183-219).
  • TaskService.create(...) acquires a single global AsyncMutex at src/node/services/taskService.ts:1503 and holds it through validation, capacity accounting, workspace fork/create, base-SHA capture, config persistence, background init kickoff, and workspaceService.sendMessage(...) (src/node/services/taskService.ts:1503-1884).
  • The same global mutex is also used by descendant termination and queue scheduling (src/node/services/taskService.ts:1900, 1982, 3117).
  • Historical evidence: the coarse mutex and queueing design originated with ca2367a24 / “feat: sub-workspaces as subagents (🤖 feat: sub-workspaces as subagents #1219)”, whose plan emphasized restart-safe orchestration, max parallel task limits, max nesting depth, durable queued/running state, report delivery, and auto-resume. This explains the correctness-first coarse critical section.
  • WorktreeManager.createWorkspace(...) performs expensive launch work: stale lock cleanup, branch checks, best-effort fetch, git worktree add, .muxignore sync, submodule sync, and branch mapping persistence (src/node/worktree/WorktreeManager.ts:69-190).
  • runBackgroundInit(...) is fire-and-forget, but child stream startup waits for init in AIService.streamMessage(...) via initStateManager.waitForInit(...) (src/node/runtime/runtimeFactory.ts:54-66, src/node/services/aiService.ts:1111-1116). Holding TaskService.create()'s mutex until sendMessage(...) returns therefore serializes most child startup latency.

Approach options and LoC estimates

Option A — small patch: workflow calls a createMany wrapper around existing create()

Net product LoC: ~120-180.

  • Add a bulk method that loops over current TaskService.create(...) and returns an array.
  • Wire parallelAgents(...) to call the bulk method once.

Why not enough: this improves API shape but not launch throughput because every item still enters the same coarse mutex and performs the same expensive fork/init/start work serially. This is useful only as a stepping stone, not the final fix.

Option B — recommended: two-phase reservation + bounded concurrent start + workflow bulk adapter

Net product LoC: ~650-900.

  • Split task launch into:
    1. a short locked reservation/config phase;
    2. an unlocked bounded-concurrency workspace fork/init/stream-start phase;
    3. a short locked state-finalization phase.
  • Add TaskService.createMany(...) and make TaskService.create(...) delegate to it for the singleton case.
  • Add a workflow adapter bulk-start path so parallelAgents([...]) can reserve all children in one backend invocation and then wait for them concurrently.

Why recommended: it directly addresses the serialized launch bottleneck while preserving the existing durable queue, restart, max-parallelism, interrupt, and report-delivery invariants.

Option C — larger scheduler rewrite with runtime-specific pools

Net product LoC: ~1,100-1,600.

  • Replace the current queue scheduler with a dedicated task scheduler service, per-project/per-runtime pools, telemetry, and pluggable launch strategies.

Why defer: this is attractive long-term, especially for SSH/Docker/Coder runtimes, but it is more architecture than needed to solve workflow fan-out. Option B should leave seams for this later.

Recommended implementation plan (Option B)

Phase 1 — Add launch-state model, UI surface, and timing evidence

Files/symbols:

  • src/common/orpc/schemas/workspace.ts
  • src/node/services/taskService.ts
  • src/node/services/tools/task_list.ts
  • browser task/sidebar status components if they switch on taskStatus
  • task-status helpers and tests under src/common/types/tasks.ts / relevant schema tests if status helpers exist there

Changes:

  1. Add a new AgentTaskStatus value: "starting" and optional launch-failure metadata.
    • starting means: a slot is reserved and startup is in progress outside the global task mutex, but the stream has not necessarily registered yet.
    • Update schema description from queued|running|awaiting_report|interrupted|reported to include starting.
    • Add optional taskLaunchError?: string (and timestamp if useful) for startup failures that occur after reservation but before a reportable child stream. Represent those as taskStatus: "interrupted" plus taskLaunchError rather than adding a broader terminal status in this slice.
  2. Update task lifecycle predicates:
    • countActiveAgentTasks(...) must count starting as active so batch reservation cannot exceed maxParallelAgentTasks.
    • descendant-active checks should include starting alongside queued, running, and awaiting_report.
    • task_list / status filtering should include starting in active/default views where appropriate.
    • UI labels/filters should present starting as an active pre-running state (for example “Starting…”) rather than a terminal or idle state.
    • Existing/older configs without starting continue to parse exactly as before; this is an additive status value.
    • waitForAgentReport(...) should treat both queued and starting as “not running yet”; execution timeout should begin only after transition to running or after a deterministic launch-failure signal is persisted.
  3. Add low-noise timing logs around task launch phases:
    • reservation time;
    • fork/create time;
    • init wait time;
    • stream-start time;
    • total time-to-running.
      Use existing taskQueueDebug(...) / backend log patterns, not user-visible UI changes.
  4. Defensive programming:
    • assert non-empty task IDs, parent IDs, workspace names, and prompt strings at phase boundaries;
    • assert that starting tasks have taskPrompt persisted until stream send accepts the message;
    • assert that every reservation either transitions to running, back to queued, interrupted, or is removed/failed with waiters rejected.

Quality gate: targeted unit tests for starting status accounting before changing concurrency behavior.

Phase 2 — Extract reservation and launch plan from TaskService.create()

Files/symbols:

  • src/node/services/taskService.ts
  • possible small internal types near TaskCreateArgs / TaskCreateResult

Changes:

  1. Introduce internal types, roughly:
interface TaskLaunchReservation {
  taskId: string;
  workspaceName: string;
  parentWorkspaceId: string;
  status: "queued" | "starting";
  prompt: string;
  agentId: string;
  model: string;
  thinkingLevel?: ThinkingLevel;
  workflowTask?: WorkflowTaskMetadata;
}

interface TaskLaunchPlan extends TaskLaunchReservation {
  parentMeta: WorkspaceMetadata;
  taskRuntimeConfig: RuntimeConfig;
  skipInitHook: boolean;
  createdAt: string;
}

Exact names can vary; keep the types private unless tests need them.

  1. Add TaskService.createMany(argsList).
    • Validate basic per-arg shape before lock when possible.
    • Atomicity rule: reservation is all-or-none for pre-launch validation/config errors. If any spec is invalid, unrunnable, over depth, trust-blocked, or otherwise fails before workspace/runtime side effects, persist nothing for the entire batch and return/throw a batch error. Runtime launch failures after successful reservation are per-task failures.
    • Acquire this.mutex once.
    • Load config once, compute active count once, then walk args in input order.
    • For each valid task:
      • allocate ID/name;
      • enforce trust, nesting, agent-runnable checks;
      • reserve capacity by incrementing an in-memory reservedActiveCount for each starting task;
      • persist either taskStatus: "starting" for tasks admitted to capacity or taskStatus: "queued" for overflow tasks;
      • persist taskPrompt for both starting and queued until the message is accepted by sendMessage(...).
    • createMany(...) owns/schedules the unlocked startup work for admitted starting tasks before it returns; callers receive ordered reservation results only, not launch-plan internals.
    • Keep launch plans private to the scheduler/startup helper.
  2. Make TaskService.create(args) delegate to the same reservation/start machinery but preserve current singleton compatibility:
    • if capacity is unavailable, it returns { status: "queued" } as today;
    • if capacity is available, it may internally reserve starting outside the coarse lock, but the public singleton create() should still return only after the child send is accepted and the observable result can be reported as running, or after a concrete startup error is available.
    • workflow/bulk paths may observe starting because they intentionally want fast reservation and separate waiting.
  3. Keep the locked reservation phase short: no orchestrateFork(...), no readTaskBaseCommitShaByProjectPath(...), no secrets resolution, no runBackgroundInit(...), and no workspaceService.sendMessage(...) while holding this.mutex.
  4. Avoid hidden fire-and-forget races: replace raw void this.maybeStartQueuedTasks() sites with a small scheduleMaybeStartQueuedTasks() helper that catches/logs failures and coalesces concurrent scheduler requests.

Quality gate: tests that reserve 20 tasks with maxParallelAgentTasks = 16 and assert exactly 16 starting + 4 queued, without invoking runtime fork/send in the mutex-protected phase.

Phase 3 — Start reserved tasks outside the global mutex with bounded concurrency

Files/symbols:

  • src/node/services/taskService.ts
  • src/node/services/utils/forkOrchestrator.ts only if needed for clearer launch-plan inputs
  • src/node/runtime/* only if tests reveal runtime-specific issues

Changes:

  1. Add an internal startReservedAgentTask(plan) method that performs the current expensive path outside this.mutex:
    • startWorkspaceInit(...);
    • orchestrateFork(...);
    • source runtime config update, if returned;
    • readTaskBaseCommitShaByProjectPath(...);
    • update persisted workspace entry with actual path/runtime/base SHA/projects;
    • emit metadata;
    • runBackgroundInit(...);
    • workspaceService.sendMessage(...) with the renamed/expanded internal allowAgentTaskLaunch flag that permits scheduler-owned queued/starting launches;
    • transition starting -> running only after send accepts the message.
  2. Bound concurrency for startup work.
    • Keep the configured maxParallelAgentTasks as the hard active-task limit; the startup-concurrency cap only limits fork/init pressure.
    • Start conservatively with global bounded concurrency plus per-project/per-repository single-flight for git worktree fork/create operations. This avoids assuming parallel git worktree add is safe against one repo's metadata. Non-git or already-isolated runtime work can later opt into higher per-runtime concurrency after evidence.
    • Put the concurrency constants near task scheduling code or under src/constants/ if shared; avoid adding user-facing configuration until the behavior is proven.
  3. Add an interrupt-safe launch token/recheck invariant.
    • Each reserved starting task gets an internal launch token or monotonically checked status expectation.
    • startReservedAgentTask(...) must re-read the workspace entry under a short lock before expensive side effects and again before sendMessage(...).
    • If the task is no longer starting (for example parent hard-interrupt changed it to interrupted), do not call sendMessage(...); cleanup or preserve any just-created workspace according to the interrupt semantics, reject waiters, free the slot, and schedule queued work.
  4. Keep config state transitions under short locks.
    • starting -> running, starting -> queued, starting -> interrupted, source runtime config updates, workspace path/runtime/base-SHA persistence, and waiter rejection bookkeeping should reacquire this.mutex or a narrower per-task lock.
    • Expensive runtime and filesystem work stays outside the lock.
  5. Redesign maybeStartQueuedTasks() as two-phase:
    • locked phase: pick FIFO queued tasks up to available capacity, mark them starting, collect launch plans, release lock;
    • unlocked phase: run startReservedAgentTask(...) with bounded concurrency;
    • finalization: transition each task to running, queued retry, interrupted, or removed/failed, and reject waiters on unrecoverable failure.
  6. Failure semantics:
    • If fork/create fails before any child workspace is usable, roll back the reserved config entry or mark it interrupted/failed in a way existing awaiters can observe. Prefer a visible failure over a ghost starting slot.
    • If sendMessage(...) fails after workspace creation, reuse existing rollbackFailedTaskCreate(...) where safe, then reject waiters with the concrete error.
    • Persist taskStatus: "interrupted" plus taskLaunchError for startup failures that should not be retried blindly; waitForAgentReport(...) must surface that error after restart instead of hanging or looping.
    • Always free the active slot and schedule the queue after terminal launch failure.
  7. Update WorkspaceService.sendMessage(...) guards:
    • non-internal sends should be blocked for both queued and starting task workspaces;
    • replace/expand allowQueuedAgentTask with a clearer internal flag such as allowAgentTaskLaunch so the scheduler can launch starting tasks without allowing arbitrary user sends.
  8. Restart/self-healing:
    • During TaskService.initialize(), detect stale starting tasks.
    • If the task is already streaming, mark running.
    • If it has a persisted taskPrompt and no stream, safely demote to queued and schedule startup.
    • If the workspace entry is incomplete and cannot be repaired, mark interrupted and reject/record enough context for inspection.

Quality gate: crash/restart-style tests for stale starting tasks and launch failure cleanup.

Phase 4 — Add workflow bulk task API and adapt parallelAgents(...)

Files/symbols:

  • src/node/services/workflows/WorkflowRunner.ts
  • src/node/services/workflows/WorkflowTaskServiceAdapter.ts
  • src/node/services/workflows/WorkflowTaskServiceAdapter.test.ts
  • src/node/services/workflows/WorkflowRunner.test.ts

Changes:

  1. Extend WorkflowTaskAdapter with an optional create-only bulk method, for example:
createAgentTasks?(
  specs: WorkflowAgentSpec[],
  lifecycle?: {
    onTaskCreated?: (index: number, taskId: string) => Promise<void> | void;
  }
): Promise<Array<{ taskId: string; status: "queued" | "starting" | "running" }>>;
  1. Implement this in WorkflowTaskServiceAdapter by calling taskService.createMany(...) once.
  2. Update runAgentStepsInParallel(...) to use the bulk path when available:
    • partition pending work into existing taskId steps vs. new task steps;
    • bulk-create all new steps in input order;
    • record recordStepStarted(...) and task-start events as each task ID is returned;
    • wait for every created/existing task through waitForAgentTask(...) concurrently;
    • preserve existing retry behavior for structured-output validation failures;
    • preserve runOrResumeAgentStep(...) restart semantics instead of bypassing them. Either feed pre-created task IDs back through that helper or factor out the wait/restart branch so Task not found / Task interrupted still restart when appropriate.
    • if one create/wait fails, preserve current sibling-interrupt behavior via interruptRun().
  3. Keep agent(spec) / single-step behavior compatible by having runAgent(...) continue to work, possibly implemented via the same singleton bulk path.
  4. Add assertions:
    • bulk create result length equals input spec length;
    • each returned taskId is non-empty;
    • no duplicate workflow step IDs within one parallelAgents(...) call (or assert earlier if already guaranteed by replay IDs).

Quality gate: workflow tests proving a 20-spec parallelAgents(...) invokes createMany once, records task-start events for all children, returns ordered results, and starts waits concurrently.

Phase 5 — Tests and regression coverage

Targeted tests:

  • src/node/services/taskService.test.ts
    • createMany reserves capacity in one locked pass.
    • create delegates to createMany singleton and preserves result shape.
    • starting tasks count against maxParallelAgentTasks.
    • maybeStartQueuedTasks marks queued tasks starting before unlocked launch.
    • launch failure frees slot, persists taskLaunchError, rejects waiters, and schedules queued work.
    • hard interrupt handles starting descendants like active descendants.
    • concurrent interrupt during startup: reserve starting, block fork/create, hard-interrupt parent, unblock fork/create, assert no sendMessage(...), any created workspace is cleaned/preserved according to interrupt policy, waiters are rejected, and the slot is freed.
    • restart demotes stale starting to queued or running based on stream state.
    • launch failure is observable after restart and does not retry forever.
  • src/node/services/workflows/WorkflowTaskServiceAdapter.test.ts
    • adapter bulk method calls taskService.createMany once with workflow metadata on every item.
    • per-item failures include the workflow step ID in the thrown error.
  • src/node/services/workflows/WorkflowRunner.test.ts
    • parallelAgents uses bulk creation when adapter supports it.
    • preserves existing ordered results, incremental completion recording, sibling interrupt behavior, foreground-background behavior, and validation retry behavior.
  • Schema/type/UI-status tests for taskStatus: "starting" and taskLaunchError, including task-list active filters and display labels where applicable.

Validation commands:

bun test src/node/services/taskService.test.ts --filter "createMany|starting|queued|parallel"
bun test src/node/services/workflows/WorkflowTaskServiceAdapter.test.ts
bun test src/node/services/workflows/WorkflowRunner.test.ts --filter "parallelAgents|bulk|validation"
bun test src/common/orpc/schemas/workspace.test.ts src/common/types/tasks.test.ts
bun run typecheck
make static-check

Use run_and_report if batching validation commands in one shell invocation.

Acceptance criteria

  1. parallelAgents([...20 specs]) performs one workflow-adapter bulk create call for the new child tasks, not 20 independent TaskService.create(...) calls.
  2. createMany(...) reservation is all-or-none for pre-launch validation/config failures: invalid input persists no partial batch state.
  3. Singleton TaskService.create(...) preserves existing public behavior: returns queued immediately when capacity is unavailable, or returns running only after startup send is accepted when capacity is available.
  4. The global task mutex is not held while child workspaces are forked/created, init waits, or child streams are started.
  5. With maxParallelAgentTasks = 16 and no other active tasks, a 20-child workflow reserves exactly 16 active starts and 4 queued tasks.
  6. The admitted children begin startup with bounded concurrency and do not wait for previous siblings to finish full fork/init/stream-start before the next sibling is reserved.
  7. Git worktree fork/create is protected by per-project/per-repo single-flight unless tests prove a safer higher-concurrency path.
  8. Config/restart invariants hold: no duplicate task IDs, no orphan starting tasks after restart, no ghost active slots after launch failure, and persisted taskLaunchError is surfaced to awaiters after restart.
  9. Interrupt/termination semantics still handle queued, starting, running, awaiting-report, interrupted, and reported descendants correctly, including interrupt-during-startup.
  10. Existing foreground task tool behavior remains compatible: if a task is queued or starting, foreground waits block until the child reports or fails; execution timeout starts only once the child is actually running.
  11. Existing workflow replay semantics are preserved: completed steps are reused, changed specs get new work, ordered results remain ordered, validation retries only retry failed steps, and Task not found / Task interrupted restart behavior remains intact.
  12. Non-internal user sends are blocked for both queued and starting task workspaces; only the internal scheduler can launch them.
  13. All targeted tests, typecheck, and static checks pass.

Dogfooding plan

Use the requested repo skills:

  • dev-server-sandbox for an isolated Mux dev server with its own temporary MUX_ROOT and free backend/Vite ports.
  • agent-browser for UI interaction, screenshots, console/error checks, and video recording.
  • dogfood discipline: capture evidence as the workflow is exercised, not afterward.

Dogfood setup

  1. Start an isolated dev server:
KEEP_SANDBOX=1 make dev-server-sandbox DEV_SERVER_SANDBOX_ARGS="--clean-projects"

If API/provider config is needed for real child agents, omit --clean-providers so the sandbox copies provider config from the seed Mux root. Do not copy secrets manually.

  1. Create a temporary Git repo for the sandbox project:
repo=$(mktemp -d)
cd "$repo"
git init
git config user.email dogfood@example.com
git config user.name "Mux Dogfood"
echo "# fanout dogfood" > README.md
git add README.md
git commit -m "initial"
  1. Add a scratch workflow in that temp repo (not the tracked Mux source tree) that fans out 20 lightweight explore children:
// description: Fan out 20 lightweight explore agents for task-launch dogfooding
export default function workflow({ parallelAgents }) {
  const results = parallelAgents(Array.from({ length: 20 }, (_, index) => ({
    id: `fanout-${index + 1}`,
    title: `Fanout ${index + 1}`,
    agentId: "explore",
    prompt: "Inspect README.md and report the first heading only. Keep the report under 20 words.",
  })));
  return { reportMarkdown: `Completed ${results.length} fanout tasks.` };
}
  1. Open the app with agent-browser using the Vite URL printed by make dev-server-sandbox:
mkdir -p dogfood-output/screenshots dogfood-output/videos
agent-browser --session workflow-fanout open "$VITE_URL"
agent-browser --session workflow-fanout wait --load networkidle
agent-browser --session workflow-fanout screenshot --annotate dogfood-output/screenshots/initial.png
agent-browser --session workflow-fanout snapshot -i

Dogfood execution and evidence

  1. Start recording before launching the workflow:
agent-browser --session workflow-fanout record start dogfood-output/videos/fanout-20-launch.webm
  1. In the UI:

    • add/open the temp repo project;
    • trust it if prompted;
    • open a parent workspace;
    • run the scratch workflow;
    • watch the child task list/sidebar while the 20 children are reserved and started.
  2. Capture evidence at key points:

agent-browser --session workflow-fanout screenshot --annotate dogfood-output/screenshots/before-run.png
# launch workflow in UI
sleep 2
agent-browser --session workflow-fanout screenshot --annotate dogfood-output/screenshots/after-reservation.png
sleep 5
agent-browser --session workflow-fanout screenshot --annotate dogfood-output/screenshots/running-and-queued.png
agent-browser --session workflow-fanout errors > dogfood-output/browser-errors.txt
agent-browser --session workflow-fanout console > dogfood-output/browser-console.txt
  1. Stop recording after the workflow completes or after enough launch behavior is visible:
agent-browser --session workflow-fanout record stop
agent-browser --session workflow-fanout screenshot --annotate dogfood-output/screenshots/final.png
  1. Backend/log checks:

    • verify timing logs show a single bulk reservation for 20 specs;
    • verify the mutex-held reservation phase is short;
    • verify fork/init/start work overlaps up to the configured bounded concurrency;
    • verify max active tasks never exceeds maxParallelAgentTasks.
  2. Attach the video and key screenshots to the implementation summary so reviewers can verify the dogfood path.

Risks and mitigations

  • Git worktree contention: starting many worktree forks at once may contend on Git metadata. Mitigate with global bounded concurrency plus per-project/per-repo single-flight for git worktree fork/create in the first implementation.
  • Behavioral change in TaskService.create(...): callers may currently expect synchronous fork/start failures. Preserve singleton create() semantics; expose fast starting reservation only to bulk/workflow paths that explicitly use it.
  • Restart complexity: starting introduces a new transient persisted state. Mitigate with explicit startup self-healing tests.
  • Interrupt races: hard interrupts must handle starting tasks before sendMessage(...) accepts the prompt. Keep taskPrompt until accepted and ensure waiters are rejected after persisted interruption.
  • Workflow replay: bulk creation must not bypass step records. Record task-start events immediately after reservation returns task IDs, before waits begin.

Reviewer focus

  1. Does the plan keep the global maxParallelAgentTasks invariant under concurrent bulk and non-bulk creates?
  2. Is starting the right persisted state, or should reservation use an existing state plus extra metadata?
  3. Is per-project/per-repo single-flight for git worktree operations conservative enough for the first implementation?
  4. Is the WorkflowTaskAdapter.createAgentTasks(...) shape the smallest interface that gives parallelAgents(...) one backend create invocation while preserving workflow replay/retry semantics?
  5. Are singleton TaskService.create(...) compatibility and createMany(...) all-or-none reservation semantics clear enough for implementation?

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh • Cost: 923094{MUX_COSTS_USD:-unknown}

ThomasK33 added 2 commits June 8, 2026 16:24
Implement two-phase bulk task reservation and launch for parallel workflow agents, including starting-state persistence, scheduler coalescing, workflow bulk creation, task tool schema updates, and regression coverage.\n\n---\n\n_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `719969{MUX_COSTS_USD:-unknown}`_\n\n<!-- mux-attribution: model=openai:gpt-5.5 thinking=xhigh costs=105.86 -->
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@mintlify

mintlify Bot commented Jun 8, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Mux 🟢 Ready View Preview Jun 8, 2026, 4:51 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 8, 2026
@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 8, 2026
@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 8, 2026
@ThomasK33 ThomasK33 added this pull request to the merge queue Jun 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 8, 2026
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Pushed test-only hardening for the latest merge-queue unit timeout:

  • replaced 10ms deep-review workflow run-store lock leases with the existing test lease constant because parallel workflow callbacks legitimately contend under coverage load
  • moved the prior WorkflowRunner abort-signal assertion to the handoff point instead of after workflow cleanup

Validation:

  • bun test src/node/services/workflows/builtInWorkflowDefinitions.test.ts
  • bun test src/node/services/workflows/WorkflowRunner.test.ts --filter "reuses a recorded started task id instead of respawning on resume"
  • make static-check

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d1dbeb03f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/taskService.ts Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

Addressed Codex finding: Preserve legacy queued-task resume path.

Response:

  • Restored the legacy no-taskPrompt queued-task path as an explicit resumeStream launch mode.
  • Modern queued/new tasks still use sendMessage with a required prompt.
  • Added a regression test covering a queued task record with no persisted taskPrompt, asserting it launches via resumeStream and transitions to running.

Validation:

  • bun test src/node/services/taskService.test.ts -t "resumes legacy queued tasks"
  • bun test src/node/services/taskService.test.ts src/node/services/workflows/WorkflowRunner.test.ts src/node/services/workflows/builtInWorkflowDefinitions.test.ts
  • make static-check

@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d659ba48c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/taskService.ts Outdated
@ThomasK33

Copy link
Copy Markdown
Member Author

Addressed Codex finding: Avoid replaying accepted starting prompts on restart.

Response:

  • Startup recovery now inspects durable task history for stale starting tasks.
  • If a user prompt is already present, recovery clears taskPrompt before demoting to queued, causing the queued launcher to use resumeStream rather than appending a duplicate prompt with sendMessage.
  • If no user history is present, recovery leaves taskPrompt intact so not-yet-sent starts still send normally.
  • Extended the regression test to cover both legacy no-taskPrompt queued records and stale starting records with an already-accepted prompt.

Validation:

  • bun test src/node/services/taskService.test.ts -t "resumes accepted queued starts"
  • bun test src/node/services/taskService.test.ts src/node/services/workflows/WorkflowRunner.test.ts src/node/services/workflows/builtInWorkflowDefinitions.test.ts
  • make static-check

@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant