Active entrypoint note: this is the current active spec entrypoint and the main product/specification reference for public contributors.
Status: Normative / hard enforcement gates must follow this document
Last Updated: 2026-02-06
Scope: this repository (monorepo), including CortexPilot Orchestrator and the multi-agent collaboration runtime
Non-goal: this document does not carry vision, story, or dialectical history; for those materials, see10_VISION.md/90_HISTORY.md
- This file is the repository's single authoritative specification (SSOT). Any implementation, tool, or agent behavior must satisfy the constraints defined here.
- Other documents (vision, architecture whitepapers, dialectical notes, and similar materials) are explanation and background only; they must not override this specification.
- Field-level authority:
schemas/*.jsonare the authoritative source for field shape and required-field constraints (schema-first).- This file is the authoritative source for semantics, gate rules, on-disk layout, and state-machine behavior.
- If this file conflicts with
schemas/, schema wins for field constraints, this file wins for semantics and gates. The conflict must be removed in the next change set.
- Auditability: the full evidence chain must be attributable, reviewable, and replayable.
- Reproducibility: the system must support both rehydration and re-execution comparison.
- Strong Constraints: enforcement must come from engineering gates, not from prompt persuasion.
- Any gate failure must fail closed. The system must not continue past a failed gate.
- Any out-of-scope or out-of-policy behavior must record a
policy_violationorgate_failedevent and emit an explainable structured report.
- Execution plane (side effects allowed): any role or step that modifies files, runs commands, or produces patches must go through Codex MCP (or an equivalent execution plane) and remain constrained by
sandbox,diff gate, andworktreeisolation. - Orchestration plane (no side effects): roles that only generate structured plans, reviews, or test reports may use Agents SDK structured outputs, but they must not modify files or execute commands.
- Boundary rule: if a step requires side effects, it must use the execution plane. If it does not require side effects, prefer the orchestration plane.
Terms are the shared language of contracts and evidence chains. Inconsistent naming directly damages auditability and replayability.
- Contract: a structured task contract; the only valid instruction carrier. Natural-language handoff is forbidden.
- Result / Report: structured outputs produced by execution, review, or testing, primarily in JSON.
- Run: a complete execution instance and its evidence chain; the directory root is
.runtime-cache/cortexpilot/runs/<run_id>. - Run Bundle: the
.runtime-cache/cortexpilot/runs/<run_id>directory and everything inside it. - Run Store: the root directory
.runtime-cache/cortexpilot/runs/. - Gate: a hard gate such as diff / tool / reviewer / tests / network / MCP / integrated / sampling.
- Worktree:
- P1 default (parallel L2 / multi-worker):
.runtime-cache/cortexpilot/worktrees/<run_id>/<task_id>, one worktree per task with physical isolation. - P0 compatibility:
.runtime-cache/cortexpilot/worktrees/<run_id>is allowed only for a single task running serially. - Concurrency hard rule: if L2 parallelism or multiple workers must write concurrently, P1 is mandatory. Concurrent writes inside a single P0 worktree are forbidden.
- P1 default (parallel L2 / multi-worker):
- Event Stream:
events.jsonl(append-only). - Minimum
events.jsonlfields:ts/level/event_type/run_id/task_id/attempt/payload. - Task Contract:
schemas/task_contract.v1.json(field-level single authority). - allowed_paths: the whitelist of writable paths. Only exact paths or directory prefixes are allowed. Wildcards such as
*or**are forbidden. - sandbox: at the contract layer only
read-only | workspace-writeare allowed (mapped to Codex--sandboxat runtime). - approval_policy / approval-policy:
approval_policy: the default permission semantic in Agent Registry (for exampleuntrusted,on-request,never)approval-policy: the Codex payload field name (mapped fromtool_permissions.shell; see 6.1.2)
- thread_id / codex_thread_id: the Codex MCP session anchor for continuation and handoff (
codex_thread_idin the contract,thread_idin the manifest)
schemas/: all JSON Schemas (field-level authority)apps/orchestrator/: the single trusted control planecontracts/: contract examples and notes (examples, plans, and similar materials)docs/: human-readable documents only; they must not conflict with the specification.runtime-cache/cortexpilot/: runtime artifact root (must be gitignored; may be overridden withCORTEXPILOT_RUNTIME_ROOT)CORTEXPILOT_CODEX_BASE_HOME: the Codex MCP home root (must include fullmcp_servers.*and the Equilibrium provider). Role-specific homes must not carrymcp_servers.*
requirements-dev.txt./scripts/bootstrap.sh./scripts/test.sh
This is the smallest dependency-ordered closure path. If these items are complete, Day-1 E2E can run.
- Terminology
- The canonical terminology lives in the "Glossary And Naming" section of this file.
- Execution order (dependency-sorted)
- Freeze the schemas:
task_contract.v1.json/run_manifest.v1.json/task_result.v1.json/work_report.v1.json/review_report.v1.json/test_report.v1.json/evidence_report.v1.json/evidence_bundle.v1.json/reexec_report.v1.json/agent_registry.v1.json/orchestrator_event.v1.json - Run Store: create
.runtime-cache/cortexpilot/runs/<run_id>and initializecontract.json/manifest.json/events.jsonl - Worktree: create
.runtime-cache/cortexpilot/worktrees/<run_id>/<task_id>(P1 default), rungit worktree prunefirst, andgit worktree remove --forceat the end - Locks:
.runtime-cache/cortexpilot/locks/<sha256>.lock, with all-or-nothing acquisition and release - Gates: schema / diff / tool / reviewer / tests + network / MCP / integrated / sampling (scope violations fail immediately)
- Runner: Codex / Agents Runner must inject
sandbox/approval-policy/cwd/codex_thread_id- Production execution must be MCP-only;
codex exec --jsonis reserved for diagnostics and regression sampling 6.1 Output schema binding: any step that requires structured output must bind an output schema (see 6.2.1) and enforce it at the execution layer; missing schema binding must fail closed 6.2 Role prompt discipline: when structured output is required, the role prompt must not request natural-language delivery; all content must remain inside JSON fields
- Production execution must be MCP-only;
- Evidence: write
patch.diff/diff_name_only.txt/reports/*.json, then generatemanifest.json+evidence_hashes - Replay: support both Rehydration (no LLM call) and Re-execution (rerun)
- CLI and acceptance:
init/doctor/run/serve+ Day-1 E2E
- Freeze the schemas:
- Archival rule
- Runtime artifacts may land only inside
.runtime-cache/cortexpilot/runs/<run_id>. Scattered output paths are forbidden.
- Runtime artifacts may land only inside
schemas/task_contract.v1.jsonschemas/task_result.v1.jsonschemas/work_report.v1.jsonschemas/review_report.v1.jsonschemas/test_report.v1.jsonschemas/orchestrator_event.v1.jsonschemas/reexec_report.v1.jsonschemas/run_manifest.v1.jsonschemas/evidence_bundle.v1.jsonschemas/evidence_report.v1.jsonschemas/agent_registry.v1.json
- Schema drift detection: any schema change must be detectable and traceable through commit, version, and change history.
- Parsers may use a tolerant strategy for unknown fields and preserve the raw event payload.
- Missing required fields must still fail closed.
- The actual on-disk root is
.runtime-cache/cortexpilot/runs/<run_id>. events.jsonlmust remain append-only, and writes must use bothflushandfsync.
The list below is the minimum allowed set. Extensions are allowed; removals are not.
- General rule
- Every task execution artifact must be written into this structure.
- It must support:
- Replay by Rehydration
- Audit
- On-disk rule
- The real on-disk location must be
.runtime-cache/cortexpilot/runs/<run_id>.
- The real on-disk location must be
- Directory structure (aligned to the current implementation)
.runtime-cache/cortexpilot/runs/ ├── <run_id>/ # unique directory for each run │ ├── contract.json # initial Task Contract │ ├── manifest.json # Run Manifest │ ├── events.jsonl # Orchestrator event stream (evidence-hash baseline) │ ├── patch.diff # root-level patch │ ├── diff_name_only.txt # root-level diff file list │ ├── meta.json # runtime metadata │ ├── worktree_ref.txt # linked worktree path │ ├── reports/ # structured reports (task_result.json / review_report.json / test_report.json / evidence_bundle.json / evidence_report.json) │ ├── artifacts/ # artifacts │ ├── tasks/ # sub-task contracts │ ├── results/ # mirrored task results (results/<task_id>/result.json + patch) │ ├── reviews/ # reviewer outputs (task-level) │ ├── ci/ # CI / test runner outputs │ ├── patches/ # task-scoped diffs │ ├── codex/ # Codex execution-layer data │ │ └── <task_id>/ │ │ ├── events.jsonl # raw Codex event stream (supporting evidence) │ │ ├── transcript.md # human-readable session record │ │ └── thread_id.txt # Codex thread ID │ ├── git/ # Git-related evidence │ │ ├── baseline_commit.txt # pre-run baseline commit │ │ ├── patch.diff # Git-level patch │ │ └── diff_name_only.txt # Git-level diff list │ ├── tests/ # acceptance test evidence │ │ ├── command.txt # executed test command │ │ ├── stdout.log │ │ └── stderr.log │ ├── trace/ # tracing evidence │ │ └── trace_id.txt # linked OpenTelemetry / Langfuse trace ID │ └── meta.json # environment metadata (model version, params, env hash)
reports/task_result.jsonreports/review_report.jsonreports/test_report.jsonreports/evidence_bundle.jsonreports/evidence_report.json
- Only valid instruction carrier: the task contract is the only legal basis for agent collaboration.
- The Orchestrator must validate the schema during handoff and reject invalid contracts immediately.
- Hard constraints (excerpt):
allowed_pathsmust not be empty and must not use*or**assigned_agentmust exist so the execution owner is explicittool_permissionsandacceptance_testsare hard input constraintstool_permissions.filesystemmay only beread-only | workspace-writetool_permissions.shellmust map to Codexapproval-policyusing the table in 6.1.2danger-full-accessmay exist only as a platform capability enum; the contract must always reject it unless God Mode grants a temporary, fully evidenced override
- Schema path:
schemas/task_contract.v1.json - For readability, Appendix A keeps a human-readable schema copy. The implementation still follows
schemas/.
The goal is to make
tool_permissions.shelland Codexapproval-policydeterministic and testable.
| tool_permissions.shell | Codex approval-policy | Execution semantics (fail-closed) |
|---|---|---|
deny |
unset / shell tool forbidden | any shell request is rejected immediately and recorded as policy_violation |
never |
never |
automatically reject all commands that require approval |
on-request |
on-request |
enter approval flow; reject if approval is missing |
untrusted |
untrusted |
every command enters approval flow |
- Default: if the field is omitted, treat it as
deny(least privilege). - Execution boundary: the Orchestrator must apply allowlist/denylist checks before dispatching or executing commands.
This turns "structured output required" from a prompt wish into an engineering hard constraint.
- Mandatory binding: when a task requires structured output,
inputs.artifactsmust include one output schema artifact (JSON Schema).- Naming rule:
namemust beoutput_schemaoroutput_schema.<role>(for exampleoutput_schema.pm,output_schema.worker) urimust point to a repository path such asschemas/*.json, andsha256is required
- Naming rule:
- Execution-layer enforcement:
- Codex CLI / MCP must use
--output-schema - Agents SDK must use Structured Outputs (
output_typeor equivalent)
- Codex CLI / MCP must use
- Fail-closed rule: missing output-schema binding must reject execution immediately and emit
policy_violationorgate_failed
- The compiled task contract may carry a resolved
role_contractobject. role_contractdoes not replace top-level contract fields; it is the compiled, read-friendly view of:- assigned role identity
- role purpose
- prompt ref / skills ref / MCP bundle ref
- runtime binding (
runner/provider/model, when known) - tool permissions
- handoff posture
- fail-closed conditions
- The Orchestrator must keep
role_contractconsistent with the top-levelassigned_agent,tool_permissions,mcp_tool_set,runtime_options, andhandoff_chainfields. Drift between the resolved role view and the authoritative top-level contract must fail closed. - Intake preview should expose a
role_contract_summarywhen available so the preview surface and the final execution contract describe the same resolved role. - When available, the Orchestrator may also emit a contract-derived
role_binding_summaryread model in PM intake responses and run manifests so bundle/runtime state stays inspectable after execution without becoming a second execution authority source. - Read-only run surfaces may project that same contract-derived binding view as
role_binding_read_model, but those projections remain read models layered on top of the task contract rather than replacement execution authority. - Workflow/control-plane reads may project a
workflow_case_read_modelderived from the latest linked run's persistedrole_binding_summary, but that projection must remain explicitly read-only and must keepexecution_authority = task_contract. - Dashboard and desktop Workflow Case detail views may render that same
workflow_case_read_modelfor operator inspection, but they must present it as a read-only case summary instead of an execution-authority switch.
The Orchestrator may advance the state machine using structured outputs only. Natural-language parsing is forbidden.
- TaskResult required fields:
run_id,task_id,producer,status,started_at,finished_at,summary,artifacts,git,gates,next_steps - WorkReport uses lowercase status enums (
success/fail/aborted) for quick aggregation and does not replace TaskResult - ReviewReport required fields:
run_id,reviewer,reviewed_at,verdict,summary,scope_check,evidence;produced_diffmust always befalse - TestReport required fields:
run_id,task_id,runner,started_at,finished_at,status,commands,artifacts - Status enums remain uppercase across the main report layer (
SUCCESS/FAILED/BLOCKED/SKIPPED,PASS/FAIL/ERROR/SKIPPED)
- Hard rule: when a task requires structured output, the role prompt must not include a natural-language delivery checklist. All delivery content must stay inside JSON fields.
- Output gate: any non-JSON output, or any JSON output that fails the schema, must fail closed and must not reach the next state.
- Structured fallback: if explanatory text is required, it must be written into
summaryor another explicit JSON field. Text outside the JSON payload is forbidden.
- Handoff output may contain structured summary fields such as
summaryandrisks, but it must not replace or rewrite the task contract instruction. - The task contract remains the only legal instruction carrier across role transitions.
- If a handoff artifact is emitted, it is advisory evidence only; execution continues from the contract-authoritative instruction, not from a free-text rewritten instruction.
- General rule
- Strict one-way state machine
- No cross-level jumps
- Every input and output is constrained by the contract
- Gate condition: only a diff-gate pass plus test pass may advance the state
- S0: PM Agent (requirement definition)
- Input
- one-sentence user request (PRD summary)
- Output
- TaskContract (initial version, including spec, acceptance_tests, forbidden_actions)
- Execution mode
- orchestration plane (no side effects), Agents SDK structured output allowed
- Audit point
- must be pure JSON output
- PM Agent must not search the network on its own
- must include executable acceptance criteria
- Rejection
- reject immediately if
allowed_pathsis empty or uses wildcard**
- reject immediately if
- Input
- S1: Tech Lead Agent (Orchestrator / Plan)
- Input
- PM contract + current repo baseline (commit hash)
- Output
- N split sub-contracts (one per worker)
- Execution mode
- orchestration plane (no side effects), Agents SDK structured output allowed
- Audit point
- lock isolation
allowed_pathsbetween sub-contracts must not overlap unless file locking is explicitly declared
- least privilege
- each sub-task must calculate the minimum required
tool_permissions
- each sub-task must calculate the minimum required
- lock isolation
- Rejection
- if a sub-contract requests
danger-full-accessor another dangerous permission, the Orchestrator must block it automatically
- if a sub-contract requests
- Input
- S2: Worker Agent (execution)
- Input
- sub-contract + isolated Git worktree + optional Codex
thread_id
- sub-contract + isolated Git worktree + optional Codex
- Output
- Git commit / patch
reports/task_result.jsonreferencing diff, command output, and evidence links
- Execution mode
- execution plane (side effects allowed), and it must use Codex MCP (or an equivalent execution plane)
- Audit point
- Diff Gate
git diffmust remain fully contained insideallowed_paths
- Event Log
- must produce a complete
events.jsonl - must record all tool invocations
- must produce a complete
- Diff Gate
- Rejection
- any out-of-scope file modification triggers automatic rollback and marks the task failed
- Input
- S3: Reviewer Agent (audit)
- Input
- patch diff (or base-branch comparison) + worker evidence
- Output
- structured review report (blocking and non-blocking findings)
- Execution mode
- orchestration plane or read-only execution plane; either way it must remain read-only and produce structured JSON
- Audit point
- physical isolation
- must run under
--sandbox read-onlyor use/review
- must run under
- zero side effects
- worktree status after review must match the initial status exactly
- physical isolation
- Rejection
- if the Reviewer produces any diff, the task becomes a system-level fault
- Input
- S4: CI / Test Runner (validation)
- Input
- candidate branch before merge
- Output
- test logs + pass/fail verdict
- Execution mode
- execution plane or orchestration plane, but all command execution must be delegated by the Orchestrator and written as evidence
- Audit point
- it must execute exactly the
acceptance_testscommands declared in the contract
- it must execute exactly the
- Rejection
- failing tests must generate a new contract and enter the fix loop
- Input
- S5: Fix Loop (correction)
- Trigger
- blocking review or failed tests
- Behavior
- generate a new contract (
Parent ID = original task) - reference failed evidence inside
inputs.artifacts
- generate a new contract (
- Limit
- retries are constrained by
max_retries
- retries are constrained by
- Trigger
- S6: Done (release)
- Input
- commit that passes all checks
- Behavior
- merge to the main branch
- archive the run bundle
- Input
-
Overall goal
- the system must not merely suggest compliance; it must physically enforce compliance
- strong constraints come from engineering gates + contracts/protocols + physical isolation, not prompt wording
-
Four pillars
- code contracts (JSON Schema)
- version-control gate (Git Diff Gate)
- environment isolation (Git Worktree / sandbox)
- immutable evidence chain (append-only JSONL logs)
-
Four closed-loop dimensions
- instruction carrier: Task Contract
- physical constraint: sandbox + Diff Gate
- pipeline state machine: PM -> TL -> Worker -> Reviewer -> CI/Test -> Fix loop -> Done
- audit and replay: event sourcing + replay / comparison
-
Mechanism 0: Output Schema Gate (mandatory for P0)
- Purpose
- ensure every structured output strictly matches its JSON Schema
- Enforcement point
- the Orchestrator validates immediately after the Runner returns
- Action
- invalid JSON or schema mismatch -> immediate fail-closed
- record
OUTPUT_SCHEMA_ENFORCEDorgate_failed
- Purpose
-
Mechanism 1: Diff Gate (hard gate - primary defense line)
- Enforcement point
- the Orchestrator after worker completion and before review, or a Git hook
- Execution logic
# 1. collect all changed files CHANGED_FILES=$(git diff --name-only <baseline_ref>..HEAD) # 2. compare against the allowed_paths whitelist in the Task Contract # (pseudo-code logic) for FILE in $CHANGED_FILES; do if not match_any(FILE, allowed_paths); then EXIT_CODE=1 VIOLATION_FILE=$FILE break fi done - Boundary handling (mandatory for P0)
- rename/copy: both source and destination must match
allowed_paths - submodule (gitlink
160000) is rejected by default - symlink changes are rejected by default, with
realpathboundary checks against escape - binary patches are rejected by default unless explicitly allowed by contract
allowed_pathsin P0 supports exact paths and directory prefixes only;**is forbidden
- rename/copy: both source and destination must match
- Action
- on violation:
- immediately execute
git reset --hard <baseline_ref>or remove the worktree
- immediately execute
- record:
- write a
policy_violationevent to the log - do not enter the review stage
- write a
- on violation:
- Enforcement point
-
Mechanism 2: Reviewer Isolation
- Purpose
- ensure the reviewer has no ability to modify code
- Implementation A (Codex CLI native)
- use
/review - by definition it reads diff only and reports findings without touching the worktree
- use
- Implementation B (sandbox enforcement)
- MCP calls to
codex()must inject:{ "sandbox": "read-only", "approval-policy": "never" }
- MCP calls to
- Verification
- compare file hashes after review
- alert if anything changed
- Purpose
-
Mechanism 3: Tool And Command Gate
- Goal
- do not fully trust the agent to execute sensitive operations such as tests or network calls directly
- Enforcement
- Shell
- the agent may only generate a proposed command
- the Orchestrator validates it against the command allowlist (for example
pytest,npm test) and then executes it on the agent's behalf
- Network
- Codex config must use
network: denyornetwork: on-request - all external requests must go through approval or a controlled orchestrator environment
- Codex config must use
- Safe execution details (P0)
shell=Trueis forbidden; argv only- if
acceptance_tests.cmdis a string, it must beshlex.split()-parsed and shell metacharacters must be rejected - maintain
policies/command_allowlist.jsonusing argv prefix matching - every command must have a timeout; stdout and stderr must be written to artifacts
- suggested allowlist prefixes:
pytest,python -m pytest,npm test
- Shell
- Goal
-
Mechanism 4: Pre-Commit / Pre-Push Hooks
- Purpose
- prevent local human operations or orchestrator defects from creating out-of-scope commits
- Implementation
- install repository Git hooks
- read the active task contract
- rerun the Diff Gate logic
- if the contract is missing or validation fails, reject commit/push
- Purpose
-
Mechanism 5: Policy Gate (least-privilege adjudication)
- Default role permission matrix (P0, from
policies/agent_registry.json)- PM: filesystem=
read-only, shell=never, network=deny - Tech Lead: filesystem=
read-only, shell=never, network=deny - Worker: filesystem=
workspace-write, shell=never, network=deny - Reviewer: filesystem=
read-only, shell=never, network=deny - Test Runner / Orchestrator: filesystem=
workspace-write, shell=never, network=deny(command execution still goes through the command gate +acceptance_testsallowlist) - Searcher / Researcher: filesystem=
workspace-write, shell=never, network=allow(controlled retrieval tasks only)
- PM: filesystem=
- Default
forbidden_actionsdenylistrm -rf,sudo,ssh,curl,wget, and editing.env- runtime directories such as
.runtime-cache/cortexpilot/are protected
allowed_pathsbreadth review- forbid
**, empty arrays,., and/ - overly broad directories require God Mode approval
- forbid
mcp_toolsmust remain allowlisted; unknown tools are rejected immediately
- Default role permission matrix (P0, from
-
Threat model highlights (mandatory for P0)
- prompt injection / tool-output injection
- symlink escape / path traversal
- Git tricks (rename / submodule / binary)
- secrets leakage (logs / diff / trace)
- DoS / unbounded retries
- corresponding defense lines: EvidenceBundle + Diff / Command / Policy Gate
To keep the main document size under control, the following sections are split into companion volumes:
This main document keeps the authoritative index and the upstream normative rules; the split volumes carry execution details and appendix source text.