Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15
Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15atc964 wants to merge 7 commits into
Conversation
The previous workflow wrapped pytest in `timeout 60s ... && exit 0 on code 124/137`, which swallowed exit code 124 (timeout-killed) as success. In practice, pytest itself completes quickly (~9s) but the python interpreter hangs ~50s during shutdown cleaning up crewai telemetry threads. The 60s shell `timeout` was firing during that shutdown hang, killing the interpreter with SIGTERM, returning 124, which the workflow then mapped to exit 0 even though pytest had reported test failures. Symptom: run 26593648988 reported "5 failed, 709 passed" but the Test job concluded SUCCESS. Fix: drop the shell timeout wrapper entirely. Use pytest-timeout for per-test deadlines (30s unit, 60s integration) and let pytest's exit code propagate to GitHub Actions directly. bead: ar-r82f.16 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Temporary commit. Verifies the fix in the previous commit causes the CI Test job to conclude FAILURE on pytest failures (previously it silently passed). This commit will be reverted once CI is observed red. bead: ar-r82f.16 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
01d9a5d to
a361be9
Compare
… method The previous CI runs hung after pytest printed its summary line. The default signal-based pytest-timeout cannot kill processes that ignore signals or are stuck in teardown/atexit code paths. We layer two safety nets: 1. Outer `timeout --kill-after=10s 300s` ensures the shell kills the process group if pytest itself fails to exit (the actual failure mode in run 26595978562 — pytest summary printed, then 11min hang until cancellation). 2. Inner `--timeout=30 --timeout-method=thread` gives finer-grained per-test protection using thread-based termination, more reliable than signals in containerized CI. No exit-code masking. Both timeouts propagate failures naturally as job failures (exit 124 or 137). bead: ar-r82f.16 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
…r82f.16)
crewai 1.14.6 has a telemetry shutdown handler at
crewai/telemetry/telemetry.py:211 that raises RuntimeError
("cannot schedule new futures after interpreter shutdown") on clean
test exits. This causes pytest to pass in ~7s but the Python
interpreter to hang ~5min on shutdown, tripping the outer 300s
timeout and producing exit 124 / CI conclusion FAILURE despite all
tests passing.
CREWAI_TELEMETRY_OPT_OUT alone does not suppress this shutdown path.
crewai's telemetry uses OpenTelemetry under the hood, so setting
OTEL_SDK_DISABLED=true should fully disable the shutdown handler
and let the interpreter exit cleanly.
Investigation tracked in ar-r82f.21.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pragmatic ship for the lingering interpreter-shutdown hang. Cycle 3
confirmed pytest itself completes (no traceback in the killed runs),
but the Python interpreter hangs ~5min on exit, triggering the outer
timeout 300s and producing exit 124 / FAILURE despite a clean test run.
Two-part fix:
1. Bump outer `timeout` from 300s to 600s on both pytest steps. Even
if the interpreter still sits in atexit handlers for several
minutes, it has room to complete naturally instead of being killed.
2. Add belt-and-suspenders telemetry opt-out env vars on both steps,
targeting the dependencies most likely to be registering atexit
network calls:
- CREWAI_DISABLE_TELEMETRY / CREWAI_DISABLE_TRACKING: the other
crewai-recognized opt-out names (CREWAI_TELEMETRY_OPT_OUT alone
turned out to be a documented-but-unrecognized no-op).
- ANONYMIZED_TELEMETRY: chromadb's opt-out switch.
- POSTHOG_DISABLED: posthog (used by chromadb/crewai) opt-out.
Any one of these may shorten the hang; together with the 600s budget
the workflow should report SUCCESS on clean runs while still surfacing
real pytest failures (cycle 1 verified this works in 8.47s).
bead: ar-r82f.16
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Auto-fixed two ruff violations introduced by PR #13: - freewheel_adapter.py: I001 unsorted imports (alphabetized) - freewheel_oauth.py: W292 missing trailing newline Also ran `ruff format` on both files for consistency; this reflowed two short multi-line calls in freewheel_adapter.py and re-wrapped one call in freewheel_oauth.py. No logic changes. Unblocks PR #15 (ar-r82f.16) — seller CI workflow fix needs main's Lint job to pass so verification can reach pytest. bead: ar-r82f.22 Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Status — parked, not closingToday's work proved this PR's workflow change is functionally correct — it surfaces real pytest failures (verified by cycle 1: poison-pill present → CI conclusion FAILURE in 8.47s). The previous workflow silently passed those failures, which is the bug this PR is fixing. Why not merging tonight: seller-agent CI has a second, independent issue — a Python interpreter shutdown hang after pytest exits cleanly. Cycle 4 evidence (run 26599023320):
If we merge this PR as-is, every subsequent seller-agent PR would fail CI on this same hang regardless of test results — strictly worse than the current (buggy, silent-pass) state. Diagnosis (via Path to merge:
Moving to Draft for clarity. Branch and 600s+telemetry-env workflow changes are preserved — when cc bead: ar-r82f.16 (blocked-by ar-r82f.21) |
Summary
The CI Test job was silently passing on failing pytest runs. Discovered on
PR #14's run 26593648988:
pytest reported
5 failed, 709 passedbut the Test job concluded success.This means all recent "green CI" claims on
mainmay be unreliable.Root Cause
The Unit tests / Integration tests steps wrapped pytest in a shell
timeoutwith exit-code mapping:
The intent was to swallow only true timeouts (124) or SIGKILLs (137).
In practice, pytest itself completes quickly (~9s), but the python interpreter
then hangs for ~50s during shutdown cleaning up crewai telemetry threads
(
cannot schedule new futures after interpreter shutdownlog spam).The 60s shell
timeoutwas firing during that shutdown hang, killing theinterpreter with SIGTERM, returning exit code 124. The workflow then mapped
124 to
exit 0— even though pytest had already printed5 failed, 709 passedbefore the kill.
Timeline from run 26593648988 (Unit tests, py3.12):
5 failed, 709 passedFix
Drop the shell timeout wrapper entirely. Use
pytest-timeoutfor per-testdeadlines (30s unit, 60s integration). Pytest's exit code now propagates
directly to GitHub Actions.
Changes:
.github/workflows/ci.yml: replace theset +e ... timeout ... exit code mappingblock withpytest --timeout=Nfor both Unit tests and Integration tests steps.pyproject.toml: addpytest-timeout>=2.3.0to dev deps.Verification
Done with deliberate failure injection on this branch.
tests/unit/test_ci_poison_pill_DELETEME.py)with
assert False→ expect CI Test job to conclude FAILURE (the wholepoint of this PR).
Verification status will be appended as commits and CI runs land.
Test plan
Bead
ar-r82f.16— Fix seller-agent CI: workflow silently passes on failing tests