Skip to content

Probe cycle discrimination + scheduler health snapshot#50

Merged
wpak-ai merged 7 commits into
developfrom
feat/probe-cycle-health-snapshot
May 26, 2026
Merged

Probe cycle discrimination + scheduler health snapshot#50
wpak-ai merged 7 commits into
developfrom
feat/probe-cycle-health-snapshot

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented May 25, 2026

Summary

  • run_cycle() returns CycleResult (success / empty / failed) instead of a list.
  • /health reads scheduler state from a locked, immutable snapshot (last_updated, last_cycle_status, etc.).

Why

Operators could not tell a real empty probe from a failed one, and the health thread read scheduler fields without synchronization.

Main changes

  • models.pyCycleStatus, CycleResult
  • sources.py — discriminated run_cycle(), new log lines
  • monitor.py — handle all cycle outcomes; do not advance staleness on FAILED; SchedulerSnapshot + health_snapshot()
  • __main__.py — health extras from scheduler.health_snapshot()

Tests

pytest tests/test_sources.py tests/test_monitor.py tests/test_health.py -q

138 passed (includes empty/failed cycle, FAILED staleness, concurrent health snapshot).

Related issues

close #46
close #49

Summary by CodeRabbit

  • New Features

    • /health now publishes an atomic, lock-protected scheduler snapshot (last-updated, poll count, probe stats, probe success rate, last cycle status/error) and merges allow-listed queue and DB metrics
    • Probe cycles now return discriminated outcomes (SUCCESS / EMPTY / FAILED) with clearer probe-cycle logging
  • Behavior

    • /health sanitizes extra fields so extras cannot overwrite core health keys
  • Documentation

    • Clarified concurrency and immutable health-snapshot semantics
  • Tests

    • Expanded tests for health merging, concurrency, probe-cycle outcomes, and queue health metrics

Review Change Stack

@henry0816191 henry0816191 self-assigned this May 25, 2026
@henry0816191 henry0816191 requested a review from wpak-ai as a code owner May 25, 2026 16:35
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Implements discriminated ISO probe cycle outcomes (CycleResult with SUCCESS/EMPTY/FAILED), publishes an immutable SchedulerSnapshot under a lock, records cycle status/error and probe stats, merges scheduler snapshot + allow-listed MQ + db_pool for /health extras (scheduler wins on conflicts), and expands tests/docs for safety and concurrency.

Changes

Discriminated Cycles and Health Snapshots

Layer / File(s) Summary
Docs and changelog updates
CHANGELOG.md, docs/architecture.md
Document CycleResult/CycleStatus and SchedulerSnapshot + health_snapshot() behavior.
Cycle Result Types
src/paperscout/models.py
CycleStatus enum and CycleResult frozen dataclass define discriminated probe outcomes and conditional hits property.
ISOProber Cycle Outcome Refactor
src/paperscout/sources.py
run_cycle() now returns CycleResult, wraps cycle in try/except to return FAILED on cycle-level exceptions, runs concurrent probes, classifies SUCCESS/EMPTY, logs outcomes, and includes cycle_status in PROBE-CYCLE-SUMMARY.
Scheduler snapshot & health state
src/paperscout/monitor.py
Adds probe success-rate computation, frozen SchedulerSnapshot and module defaults, thread-safe health fields and helpers, and snapshot publish under a lock integrated into seed()/poll_once().
Main /health merge helpers
src/paperscout/__main__.py
Adds _mq_health_fields() and _merge_extra_health_fields() to fetch MQ metrics (with fallbacks), allow-list MQ keys, merge with scheduler snapshot (scheduler-first), and append db_pool.
Health handler sanitization
src/paperscout/health.py
Sanitize extra_fields_fn() output to prevent overwriting base health fields; build response from {**base, **safe_extra}.
MessageQueue metrics & config
src/paperscout/scout.py, src/paperscout/config.py
Add MessageQueue.health_fields() reporting mq_depth, mq_max_size, mq_utilization, mq_circuit_state, and new MQ-related Settings fields used by health merging and tests.
ISOProber Unit Tests
tests/test_sources.py
Update tests to assert cycle.status and cycle.hits; add test_run_cycle_empty and test_run_cycle_failed.
Scheduler Tests and Helpers
tests/test_monitor.py
Add CycleResult helpers, update scheduler mocks to use CycleResult objects, assert FAILED cycles don't advance last_successful_poll, and validate health_snapshot() immutability and poll_count/last_updated.
Health Endpoint Tests and Concurrency
tests/test_health.py
Expand extra_fields_fn fixture to include snapshot fields; add safety tests preventing overwrite/failure propagation; add concurrent snapshot publisher test verifying /health consistency under concurrent updates.
Health Merge Unit Tests
tests/test_main_health_merge.py
Add tests for _merge_extra_health_fields and _mq_health_fields: scheduler-wins precedence, dropping unknown MQ keys with debug logs, fallback on exceptions, and allow-listed MQ key inclusion.
MessageQueue & CycleResult Tests
tests/test_message_queue.py, tests/test_models_cycle_result.py
Add unit tests for MessageQueue.health_fields() outputs and CycleResult invariants.
Test fixtures
tests/conftest.py
Add MQ-related test defaults used by the new tests.

Sequence Diagram

sequenceDiagram
  participant Scheduler
  participant ISOProber
  participant HealthEndpoint
  participant MessageQueue
  ISOProber->>Scheduler: CycleResult = run_cycle()
  alt Probe Success
    ISOProber->>ISOProber: perform concurrent HEAD probes, collect hits
    ISOProber-->>Scheduler: CycleResult(SUCCESS, results=hits)
  else Probe Empty
    ISOProber->>ISOProber: probes return no hits
    ISOProber-->>Scheduler: CycleResult(EMPTY)
  else Probe Failed
    ISOProber->>ISOProber: exception during cycle
    ISOProber-->>Scheduler: CycleResult(FAILED, error=msg)
  end
  Scheduler->>Scheduler: _probe_hits_from_cycle() (record status/error)
  Scheduler->>Scheduler: _record_probe_cycle_completion() (store probe_stats)
  alt non-FAILED cycle
    Scheduler->>Scheduler: _mark_poll_successful_if_probe_ok()
  end
  Scheduler->>Scheduler: _publish_health_snapshot()
  HealthEndpoint->>Scheduler: health_snapshot()
  Scheduler-->>HealthEndpoint: SchedulerSnapshot dict
  HealthEndpoint->>MessageQueue: health_fields() / depth()
  MessageQueue-->>HealthEndpoint: MQ metrics dict
  HealthEndpoint->>HealthEndpoint: merge snapshot + MQ + db_pool (scheduler wins)
  HealthEndpoint-->>Client: /health response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • cppalliance/paperscout#43: Both PRs modify the ISOProber probing flow in src/paperscout/sources.py—changing run_cycle/_probe_one behavior and the PROBE-CYCLE-SUMMARY/cycle-metrics output.
  • cppalliance/paperscout#44: Overlaps on probe success rate computation and health metric plumbing.
  • cppalliance/paperscout#29: Related changes to /health extra fields and snapshot wiring.

Suggested labels

bug

Suggested reviewers

  • wpak-ai

Poem

🐰 I hopped through cycles, brave and spry,

Found success, empty, or why it died;
I locked a snapshot, stamped the time,
Merged queues and DB in tidy rhyme;
Now health reports a steady sign.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Probe cycle discrimination + scheduler health snapshot' clearly summarizes the two main changes: discriminating cycle outcomes and adding a health snapshot mechanism.
Description check ✅ Passed The description covers summary, rationale, main changes, and test results with clear references to related issues (#46, #49), though test plan checkboxes are incomplete.
Linked Issues check ✅ Passed The PR successfully implements both #46 (discriminated CycleResult with SUCCESS/EMPTY/FAILED) and #49 (synchronized health snapshots via SchedulerSnapshot), with comprehensive test coverage including concurrency scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issues #46 and #49; MessageQueue config fields and health_fields() support the health snapshot feature; no unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/probe-cycle-health-snapshot

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/paperscout/monitor.py`:
- Around line 128-136: The health snapshot fallback uses
dict(_HEALTH_SNAPSHOT_DEFAULTS) which only performs a shallow copy and ends up
sharing the nested probe_stats dict; change the construction in the code that
creates the fallback (where dict(_HEALTH_SNAPSHOT_DEFAULTS) is used) to produce
a deep/independent copy of nested structures—either use copy.deepcopy on
_HEALTH_SNAPSHOT_DEFAULTS or rebuild the dict while explicitly creating a new
empty dict for "probe_stats" (e.g., {**_HEALTH_SNAPSHOT_DEFAULTS, "probe_stats":
{}}) so each consumer gets its own probe_stats instead of a shared mutable
object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f831e1c1-21d8-46f5-b3dd-da55b443e852

📥 Commits

Reviewing files that changed from the base of the PR and between f7b0bd5 and fc58957.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • docs/architecture.md
  • src/paperscout/__main__.py
  • src/paperscout/models.py
  • src/paperscout/monitor.py
  • src/paperscout/sources.py
  • tests/test_health.py
  • tests/test_monitor.py
  • tests/test_sources.py

Comment thread src/paperscout/monitor.py
@henry0816191
Copy link
Copy Markdown
Collaborator Author

@coderabbitai don't skip. full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@henry0816191 Here is the full review of PR #50.


PR #50 — Full Review: Probe cycle discrimination + scheduler health snapshot

Summary

The implementation is well-structured, correctly addresses issues #46 and #49, and the test coverage is solid (happy path, empty, failed, staleness, concurrent snapshot reads). A few issues are worth addressing before merging.


src/paperscout/models.py

CycleResult lacks invariant validation

CycleResult is frozen but its fields are not validated. A FAILED result with error=None, or a SUCCESS result with an empty results tuple, is silently constructable.

`@dataclass`(frozen=True, slots=True)
class CycleResult:
    status: CycleStatus
    results: tuple[ProbeHit, ...] = ()
    error: str | None = None

    def __post_init__(self) -> None:
        if self.status == CycleStatus.FAILED and not self.error:
            raise ValueError("CycleResult FAILED must carry a non-empty error string")
        if self.status == CycleStatus.SUCCESS and not self.results:
            raise ValueError("CycleResult SUCCESS must carry at least one ProbeHit")
        if self.status != CycleStatus.FAILED and self.error is not None:
            raise ValueError("CycleResult error is only valid for FAILED status")

src/paperscout/sources.py

Dead code: unreachable return None in _probe_one

The async with sem: block always returns (a ProbeHit | None) or raises. The bare return None after it is unreachable and should be removed.

    async with sem:
        ...
        return ProbeHit(...)
    return None  # ← unreachable; remove this line

src/paperscout/monitor.py

1. Missing wildcard case in _probe_hits_from_cycle

The match statement covers all three current CycleStatus values, but there is no case _: wildcard. If a new status is added later the method falls through and implicitly returns None, silently breaking the -> list[ProbeHit] contract. Add a defensive fallback:

def _probe_hits_from_cycle(self, cycle: CycleResult) -> list[ProbeHit]:
    self._last_cycle_status = cycle.status
    self._last_cycle_error = cycle.error
    match cycle.status:
        case CycleStatus.SUCCESS:
            return cycle.hits
        case CycleStatus.EMPTY:
            log.info("POLL  probe cycle empty")
            return []
        case CycleStatus.FAILED:
            log.error("POLL  probe cycle failed: %s", cycle.error)
            return []
        case _:
            log.error("POLL  unknown cycle status: %s", cycle.status)
            return []

2. _record_probe_cycle_completion() is called twice in the seed paths

seed() already calls _record_probe_cycle_completion() when enable_iso_probe=True. Both early-return branches in poll_once() call it again unconditionally:

  • Cold-start path (not had_prior_state, enable_iso_probe=True): seed()_record_probe_cycle_completion(), then poll_once()_record_probe_cycle_completion() again.
  • Restart-with-prior-state path: same double-call.

The second call is harmless (snapshots the same stats), but it's misleading. Consider removing the redundant calls from poll_once() for the seed sub-paths, or moving all stat recording out of seed().

3. match/case requires Python ≥ 3.10

_probe_hits_from_cycle uses structural pattern matching. Please verify python_requires in pyproject.toml is >=3.10, or replace with an if/elif/else chain if the project targets 3.8/3.9.

4. SchedulerSnapshot.probe_stats — mutable dict in a frozen dataclass

frozen=True prevents reassigning the probe_stats field but doesn't prevent in-place mutation of the dict. Since _publish_health_snapshot copies stats (dict(self._last_probe_stats)) and health_snapshot calls dataclasses.asdict() (which deep-copies nested structures), this is safe in practice. However, for true immutability, consider using MappingProxyType or a types.MappingProxyType(stats) here, or document the invariant explicitly.


src/paperscout/__main__.py

Key merge order in _extra_health_fields

return {**scheduler.health_snapshot(), **mq_extra, "db_pool": _pool_status(pool)}

mq_extra can silently overwrite any key from scheduler.health_snapshot() (e.g., last_updated if health_fields() ever returns it). Consider an explicit allow-list or at least a debug-level log/assertion when conflicts occur.


tests/test_monitor.py

Missing test: health_snapshot() defaults before any snapshot is published

There is a test for health_snapshot() after a poll, but no test verifies the default dict returned when _health_snapshot is None (i.e., before the first _publish_health_snapshot() call):

async def test_health_snapshot_defaults_before_first_poll(self, fake_pool):
    scheduler, _, _, _, _ = _make_scheduler(fake_pool)
    snap = scheduler.health_snapshot()
    assert snap["last_updated"] is None
    assert snap["poll_count"] == 0
    assert snap["last_cycle_status"] is None
    assert snap["last_successful_poll"] is None

tests/test_health.py

TestHealthSnapshotConcurrency test could assert field-type consistency more strictly

The loop asserts last_cycle_status in ("success", "empty", None) — good. Consider also asserting that poll_count is an int and last_updated (when not None) is a parseable ISO datetime string, to catch any serialization issues:

if data["last_updated"] is not None:
    datetime.fromisoformat(data["last_updated"])  # would raise if malformed
assert isinstance(data["poll_count"], int)

Minor nits

File Location Note
sources.py run_cycle() The PROBE-CYCLE-SUMMARY JSON log is emitted after the PROBE-SUCCESS / PROBE-EMPTY log. Reversing the order would make it easier to correlate the summary line with the outcome line when grepping logs.
monitor.py health_snapshot() copy.deepcopy(_HEALTH_SNAPSHOT_DEFAULTS) is called on every request when _health_snapshot is None. Since defaults are constant, a shallow dict(_HEALTH_SNAPSHOT_DEFAULTS) is sufficient (no nested mutable values that need deep-copying).
models.py CycleResult.hits docstring "otherwise empty" — consider clarifying it also returns empty for EMPTY status, not just FAILED.

Verdict

The approach is sound. The staleness-clock fix (not advancing _last_successful_poll on FAILED) and the lock-protected snapshot are the right patterns. The main actionable items are:

  1. Unreachable return None in _probe_one (trivial).
  2. Missing wildcard case in _probe_hits_from_cycle (low-risk but important for forward safety).
  3. Python version check for match/case (must verify).
  4. Redundant _record_probe_cycle_completion() calls in seed paths (cleanup).
  5. CycleResult.__post_init__ validation (defensive programming).

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/paperscout/scout.py (1)

58-67: ⚡ Quick win

Keep mq_utilization normalized to [0.0, 1.0].

MessageQueue is unbounded today, so depth / mq_max_size can exceed 1.0 under load. Clamping keeps dashboard semantics stable.

Proposed patch
     def health_fields(self) -> dict[str, Any]:
         """Metrics for the ``/health`` endpoint (merged by ``__main__``)."""
         d = self.depth()
         m = settings.mq_max_size
+        util = (d / m) if m else 0.0
         return {
             "mq_depth": d,
             "mq_max_size": m,
-            "mq_utilization": round(d / m, 4) if m else 0.0,
+            "mq_utilization": round(min(max(util, 0.0), 1.0), 4),
             "mq_circuit_state": "closed",
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/scout.py` around lines 58 - 67, The mq_utilization calculation
in health_fields can exceed 1.0 because MessageQueue is unbounded; change the
computation in scout.health_fields to compute utilization = (d / m) if m else
0.0 and then clamp it to [0.0, 1.0] (e.g., utilization = min(1.0, max(0.0,
utilization))) before rounding; keep the existing keys
("mq_depth","mq_max_size","mq_utilization","mq_circuit_state") and use
settings.mq_max_size and self.depth() as the sources.
tests/test_main_health_merge.py (1)

59-62: ⚡ Quick win

Assert all allow-listed MQ fields in the inclusion test.

You already pass mq_utilization and mq_circuit_state in mq_extra; asserting them here closes a regression gap in allow-list behavior.

Proposed patch
 def test_merge_includes_allowlisted_mq_fields():
@@
     out = _merge_extra_health_fields(scheduler, mq_extra, {})
     assert out["mq_depth"] == 2
     assert out["mq_max_size"] == 1000
+    assert out["mq_utilization"] == 0.002
+    assert out["mq_circuit_state"] == "closed"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_main_health_merge.py` around lines 59 - 62, The test currently
only asserts mq_depth and mq_max_size after calling
_merge_extra_health_fields(scheduler, mq_extra, {}); update the test to also
assert that the allow-listed MQ fields passed in mq_extra—specifically
mq_utilization and mq_circuit_state—are present and equal to the values provided
in mq_extra, so modify the assertions in the test (around the call to
_merge_extra_health_fields) to include checks for out["mq_utilization"] and
out["mq_circuit_state"] matching the input.
src/paperscout/__main__.py (1)

41-48: ⚡ Quick win

Guard MQ health extraction so one MQ failure doesn’t drop all extras.

If mq.health_fields() raises, _extra_health_fields() fails and /health falls back to base-only. Catching here keeps scheduler/db pool extras available.

Proposed patch
 def _mq_health_fields(mq: MessageQueue) -> dict:
     """MQ metrics for /health; from health_fields() when present, else depth only."""
     if hasattr(mq, "health_fields"):
-        raw = mq.health_fields()
+        try:
+            raw = mq.health_fields()
+        except Exception:
+            log.exception("health: mq.health_fields() failed, using mq_depth only")
+            return {"mq_depth": mq.depth()}
         if isinstance(raw, dict):
             return raw
         log.warning("health: mq.health_fields() returned non-dict, using mq_depth only")
     return {"mq_depth": mq.depth()}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/__main__.py` around lines 41 - 48, Wrap the call to
mq.health_fields() in a try/except so a thrown exception from any MessageQueue
implementation doesn’t cancel gathering other extras; in _mq_health_fields, call
mq.health_fields() inside a try block, catch Exception, log a warning including
the exception details and the MQ identity, and then fall back to returning
{"mq_depth": mq.depth()} (preserving the existing non-dict check path). Ensure
you reference _mq_health_fields and use mq.health_fields and mq.depth in the
fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/paperscout/__main__.py`:
- Around line 41-48: Wrap the call to mq.health_fields() in a try/except so a
thrown exception from any MessageQueue implementation doesn’t cancel gathering
other extras; in _mq_health_fields, call mq.health_fields() inside a try block,
catch Exception, log a warning including the exception details and the MQ
identity, and then fall back to returning {"mq_depth": mq.depth()} (preserving
the existing non-dict check path). Ensure you reference _mq_health_fields and
use mq.health_fields and mq.depth in the fix.

In `@src/paperscout/scout.py`:
- Around line 58-67: The mq_utilization calculation in health_fields can exceed
1.0 because MessageQueue is unbounded; change the computation in
scout.health_fields to compute utilization = (d / m) if m else 0.0 and then
clamp it to [0.0, 1.0] (e.g., utilization = min(1.0, max(0.0, utilization)))
before rounding; keep the existing keys
("mq_depth","mq_max_size","mq_utilization","mq_circuit_state") and use
settings.mq_max_size and self.depth() as the sources.

In `@tests/test_main_health_merge.py`:
- Around line 59-62: The test currently only asserts mq_depth and mq_max_size
after calling _merge_extra_health_fields(scheduler, mq_extra, {}); update the
test to also assert that the allow-listed MQ fields passed in
mq_extra—specifically mq_utilization and mq_circuit_state—are present and equal
to the values provided in mq_extra, so modify the assertions in the test (around
the call to _merge_extra_health_fields) to include checks for
out["mq_utilization"] and out["mq_circuit_state"] matching the input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8da341de-1054-40e5-be6d-4857363b9673

📥 Commits

Reviewing files that changed from the base of the PR and between f637b98 and cf54eec.

📒 Files selected for processing (13)
  • src/paperscout/__main__.py
  • src/paperscout/config.py
  • src/paperscout/health.py
  • src/paperscout/models.py
  • src/paperscout/monitor.py
  • src/paperscout/scout.py
  • src/paperscout/sources.py
  • tests/conftest.py
  • tests/test_health.py
  • tests/test_main_health_merge.py
  • tests/test_message_queue.py
  • tests/test_models_cycle_result.py
  • tests/test_monitor.py
💤 Files with no reviewable changes (1)
  • src/paperscout/sources.py
✅ Files skipped from review due to trivial changes (1)
  • tests/conftest.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_message_queue.py (1)

33-41: 💤 Low value

Consider asserting the patched mq_max_size value.

The test correctly verifies clamping behavior, but adding an assertion for fields["mq_max_size"] == 2 would confirm the patch was applied successfully and make the test's setup more explicit.

✨ Optional enhancement
     assert fields["mq_depth"] == 5
+    assert fields["mq_max_size"] == 2
     assert fields["mq_utilization"] == 1.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_message_queue.py` around lines 33 - 41, In the
test_health_fields_clamps_utilization_when_depth_exceeds_max add an assertion
that the patched mq_max_size was applied by asserting fields["mq_max_size"] == 2
after calling mq.health_fields(); this confirms the patch on
paperscout.scout.settings.mq_max_size took effect and makes the test setup
explicit while keeping the existing assertions (fields["mq_depth"] and
fields["mq_utilization"]) intact.
src/paperscout/__main__.py (1)

46-54: ⚡ Quick win

Guard fallback mq.depth() reads so /health extras remain available.

If mq.depth() fails on Line 54, extra_fields_fn() fails and the handler drops all extras (including scheduler snapshot). Keep _mq_health_fields() exception-safe in both fallback paths.

Proposed defensive patch
 def _mq_health_fields(mq: MessageQueue) -> dict:
     """MQ metrics for /health; from health_fields() when present, else depth only."""
     if hasattr(mq, "health_fields"):
         try:
             raw = mq.health_fields()
         except Exception as exc:
             log.warning(
                 "health: mq.health_fields() failed for %s id=%s: %s",
                 type(mq).__name__,
                 id(mq),
                 exc,
                 exc_info=True,
             )
-            return {"mq_depth": mq.depth()}
+            try:
+                return {"mq_depth": mq.depth()}
+            except Exception:
+                log.warning("health: mq.depth() fallback failed; omitting MQ fields", exc_info=True)
+                return {}
         if isinstance(raw, dict):
             return raw
         log.warning("health: mq.health_fields() returned non-dict, using mq_depth only")
-    return {"mq_depth": mq.depth()}
+    try:
+        return {"mq_depth": mq.depth()}
+    except Exception:
+        log.warning("health: mq.depth() failed; omitting MQ fields", exc_info=True)
+        return {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/__main__.py` around lines 46 - 54, The fallback call to
mq.depth() can itself raise and currently will propagate out of
extra_fields_fn()/ _mq_health_fields(), dropping all health "extras"; wrap any
call to mq.depth() (the one inside the exception handler and the one inside
_mq_health_fields()) in its own try/except and return a safe sentinel (e.g. None
or "unknown") or omit that field on error, while still logging the depth failure
(include exc_info=True) so the rest of the extras (scheduler snapshot, etc.) are
preserved; update both the except block shown and the _mq_health_fields()
implementation to be exception-safe around mq.depth().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/paperscout/__main__.py`:
- Around line 46-54: The fallback call to mq.depth() can itself raise and
currently will propagate out of extra_fields_fn()/ _mq_health_fields(), dropping
all health "extras"; wrap any call to mq.depth() (the one inside the exception
handler and the one inside _mq_health_fields()) in its own try/except and return
a safe sentinel (e.g. None or "unknown") or omit that field on error, while
still logging the depth failure (include exc_info=True) so the rest of the
extras (scheduler snapshot, etc.) are preserved; update both the except block
shown and the _mq_health_fields() implementation to be exception-safe around
mq.depth().

In `@tests/test_message_queue.py`:
- Around line 33-41: In the
test_health_fields_clamps_utilization_when_depth_exceeds_max add an assertion
that the patched mq_max_size was applied by asserting fields["mq_max_size"] == 2
after calling mq.health_fields(); this confirms the patch on
paperscout.scout.settings.mq_max_size took effect and makes the test setup
explicit while keeping the existing assertions (fields["mq_depth"] and
fields["mq_utilization"]) intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a757c472-99ee-44c9-a03f-83117e29ff55

📥 Commits

Reviewing files that changed from the base of the PR and between cf54eec and c0c5286.

📒 Files selected for processing (4)
  • src/paperscout/__main__.py
  • src/paperscout/scout.py
  • tests/test_main_health_merge.py
  • tests/test_message_queue.py

@wpak-ai wpak-ai merged commit 01562cf into develop May 26, 2026
10 checks passed
@wpak-ai wpak-ai deleted the feat/probe-cycle-health-snapshot branch May 26, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synchronize Health Endpoint Reads of Scheduler State Discriminate Failed vs Empty Probe Cycles in run_cycle

2 participants