Skip to content

explore: reuse + unify the OGC engine (pager, aggregation, ambient state) — net −64 LOC#4

Merged
thodson-usgs merged 1 commit into
feat/waterusefrom
explore/http-engine
Jun 23, 2026
Merged

explore: reuse + unify the OGC engine (pager, aggregation, ambient state) — net −64 LOC#4
thodson-usgs merged 1 commit into
feat/waterusefrom
explore/http-engine

Conversation

@thodson-usgs

@thodson-usgs thodson-usgs commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Exploration, draft. Stacked on the wateruse branch (DOI-USGS#328) so the diff is only the change. Question: can codifying the OGC engine's HTTP pagination / aggregation / error-recovery / ambient-state patterns reduce complexity (branches, duplication, indirection) while staying readable? Yes — by reusing and unifying what the engine already has rather than building parallel cores. Iterated as a logged try-measure-keep-or-revert loop. Source net −64 LOC across 6 refactor commits, plus a rate-limit correctness fix.

Commits (newest first)

commit change why it's simpler
7899b60d Ambient[T] class for the per-call ContextVars (_row_cap, _ogc_base_url, _dialect, _chunked_client) each var+setter pair (~13 lines) collapses to one line; bundles state + get/scope; −36 LOC
ebc3ded6 move _QUOTA_HEADER to the base layer (planning) one quota-header literal; fixes a layering inversion
35e43dff table-dispatch _resolve_locations −2 branches; selectors declared once (Open/Closed)
c3018c3c drop the dead single-response branch in _combine_chunk_responses −1 branch (_lowest_remaining already handles it)
381d82f6 dedup _paginate's verbatim per-page progress block → report_page the progress contract is defined once
a5b3d029 unify the two response-foldersplanning._merge_response one fold op, not two near-duplicates across modules (−26 LOC)
9ae73db7 wateruse reuses _paginate / _run_sync / _combine_chunk_*; rate-limit unified on lowest-remaining one pager / sync bridge / aggregator, not two

Two headline abstractions

planning._merge_response (a5b3d02) — engine._aggregate_paginated_response (pagination) and planning._combine_chunk_responses (fan-out) did the same low-level op (copy a base response, rebuild headers from a chosen response, set elapsed, optionally override url). Now one function behind both.

utils.Ambient[T] (7899b60d) — every ambient ContextVar needed a _x_var = ContextVar(...) plus a hand-written @contextmanager setter repeating the set/try/finally/reset dance (which leaks the value if a reset is dropped). Ambient bundles the var with .get() and a __call__ scope into one object, so each ambient is a single declaration. with _x(value): call sites are unchanged; readers shorten (_x_var.get()_x.get()). A Generic[_T] keeps exact value types (incl. Ambient[int | None]). This is the class abstraction that belongs — it bundles state with behaviour.

Correctness fix (from 9ae73db)

x-ratelimit-remaining reports the lowest value any concurrent sub-request saw (the quota actually left), not last-by-index — one shared _lowest_remaining, fixing a latent inaccuracy in the OGC chunker too.

Rejected, with reasons (uncommitted ITERATION_LOG.md)

Several candidates were measured/reasoned and dropped because they added complexity or indirection without removing duplication: extracting _fan_out (no clean 2nd consumer — the chunker's gather carries retry/resume the lean fan-out omits); DRY-ing the two wateruse validators (per-selector messages must survive; flat funcs read better); a _scoped function for the ContextVar dance (superseded by the Ambient class — it left the var+setter pairing intact, +7 LOC vs −36); a Pager class (ceremony, no shared state); unifying the two HTTP-status raisers (would push OGC concerns into the generic one; they already share error_for_status).

Numbers & verification

  • Source net −64 LOC (engine.py −80, wateruse.py −35, chunking.py −31, utils.py +36 for the shared Ambient/_merge_response helpers, planning.py +46 for _merge_response/_lowest_remaining/_QUOTA_HEADER + docs — logic moved into shared, tested helpers).
  • Behaviour-preserving (live-verified earlier): single/paginated/fan-out identical; lowest rate-limit + query-identity URL; {detail} errors; works inside a running loop.
  • All offline OGC/wateruse/utils/progress suites green; ruff + mypy --strict clean. (Live tests that hit the real gateway flake on transient 502s — unrelated to the diff.)

Recommendation

Viable merge candidate — net reduction, de-duplicated transport (one pager / response-folder / sync bridge / ambient mechanism), a real rate-limit fix, no readability regression. Run it past the live OGC suites, then promote from draft.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs force-pushed the explore/http-engine branch from 2853603 to e76428d Compare June 23, 2026 14:56
@thodson-usgs thodson-usgs changed the title explore: shared async fetch core (assessment — net +91 LOC, do not merge as-is) explore: reuse the OGC engine's pager/aggregation (net −29 LOC) — passes the gate Jun 23, 2026
@thodson-usgs thodson-usgs force-pushed the explore/http-engine branch from e76428d to 9ae73db Compare June 23, 2026 15:10
@thodson-usgs thodson-usgs changed the title explore: reuse the OGC engine's pager/aggregation (net −29 LOC) — passes the gate explore: reuse the OGC engine's pager/aggregation (net −7 LOC + a correctness fix) Jun 23, 2026
@thodson-usgs thodson-usgs changed the title explore: reuse the OGC engine's pager/aggregation (net −7 LOC + a correctness fix) explore: reuse + unify the OGC pager/aggregation (net −33 LOC + a correctness fix) Jun 23, 2026
@thodson-usgs thodson-usgs changed the title explore: reuse + unify the OGC pager/aggregation (net −33 LOC + a correctness fix) explore: reuse + unify the OGC pager/aggregation (1 structural + 4 complexity refactors) Jun 23, 2026
@thodson-usgs thodson-usgs changed the title explore: reuse + unify the OGC pager/aggregation (1 structural + 4 complexity refactors) explore: reuse + unify the OGC engine (pager, aggregation, ambient state) — net −64 LOC Jun 23, 2026
…ient state

Reuse and unify the OGC engine's HTTP pagination / aggregation / error-recovery /
ambient-state plumbing instead of carrying parallel implementations. Net source
reduction of ~66 LOC, behavior-preserving, plus one rate-limit correctness fix.

wateruse reuse:
- wateruse drives the engine's generic `_paginate` (with an injected
  `raise_for_status` for the NWDC `{detail}` envelope), `_run_sync` (anyio portal,
  Jupyter-safe), and `_combine_chunk_frames` / `_combine_chunk_responses`
  aggregators — replacing a hand-rolled pager, thread bridge, and bespoke
  aggregation. `_resolve_locations` becomes a `_LOCATION_BUILDERS` table dispatch,
  dropping a 3-way if/elif and a duplicated selector enumeration.

Engine unification:
- `planning._merge_response`: one low-level "fold N responses into one" behind both
  pagination (`_paginate`) and chunked/fan-out aggregation
  (`_combine_chunk_responses`), replacing two near-duplicate implementations; deletes
  `engine._aggregate_paginated_response`.
- `utils.Ambient[T]`: a generic ContextVar-with-scope class collapsing each per-call
  ambient (`_row_cap`, `_ogc_base_url`, `_dialect`, the chunker's `_chunked_client`)
  from a var + hand-written `@contextmanager` setter pair into one declaration.
  `with _x(value):` call sites unchanged; readers shorten to `_x.get()`.
- `_paginate`'s verbatim per-page progress block deduped into a `report_page` closure.
- `_combine_chunk_responses`: dropped a dead single-response branch.
- `_QUOTA_HEADER` moved to the base `planning` module — dedups the literal and fixes
  a layering inversion (planning had hard-coded it, unable to import from chunking).
- `_cql2_param`: CQL2 filter list built as a comprehension.
- `engine._check_id_format`: inlined into its only caller; dead re-export dropped.

Rate-limit correctness fix:
- `x-ratelimit-remaining` now reports the LOWEST value any concurrent sub-request
  saw (the quota actually left after a fan-out), via a shared `_lowest_remaining`,
  instead of the last-by-index — fixing a latent inaccuracy in the OGC chunker too.

Behavior-preserving (live-verified); offline OGC/wateruse/utils/progress suites
green; ruff + mypy --strict clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
@thodson-usgs thodson-usgs force-pushed the explore/http-engine branch from 08a8f9c to 9b74119 Compare June 23, 2026 16:58
@thodson-usgs thodson-usgs merged commit 9b74119 into feat/wateruse Jun 23, 2026
2 checks passed
@thodson-usgs

Copy link
Copy Markdown
Owner Author

Folded into DOI-USGS#328: the squashed refactor commit 9b741190 (reuse + unify the OGC engine — pager, aggregation, ambient state, net source ≈ −66 LOC) now sits on top of the wateruse commit on feat/wateruse as the second commit of that PR. GitHub auto-marked this PR merged once its head became reachable from its base — no separate review needed here.

@thodson-usgs thodson-usgs deleted the explore/http-engine branch June 23, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant