feat(wateruse): add water-use module for the NWDC API#328
Merged
Conversation
8841d52 to
2d55d48
Compare
19105d8 to
0f20ada
Compare
Add `dataretrieval.wateruse` for USGS National Water Availability Assessment
Data Companion (NWDC) water-use estimates — modeled on a HUC12 grid and
queryable by state, county, or hydrologic unit. This is the modern replacement
for the defunct legacy NWIS water-use service (`nwis.get_water_use` now points
callers here).
from dataretrieval import wateruse
df, md = wateruse.get_wateruse(
model="wu-public-supply-wd",
variable=["pswdtot", "pswdgw", "pswdsw"],
state="RI",
start_date="2020-01",
time_resolution="monthly",
)
The NWDC is a plain CSV REST service, not an OGC API Features collection, so the
module supplies the NWDC-specific pieces (CSV parsing, the RFC 8288 Link-header
pagination cursor, the `{detail}` error envelope, and state/county/huc location
builders) but reuses the OGC engine's generic transport rather than
re-implementing it: the shared pager (`_paginate`), the Jupyter-safe anyio sync
bridge (`_run_sync`), response/frame aggregation, and `_default_headers`. It
keeps the package conventions where they fit — a `(DataFrame, BaseMetadata)`
return, the typed `DataRetrievalError` taxonomy (surfacing the NWDC `detail`),
`API_USGS_PAT` token support, idiomatic snake_case params, and `state` /
`county` / `huc` selectors that each accept a value or a list (a list fans out
one concurrent request per location). Large areas paginate transparently.
A `FutureWarning` flags the module as experimental, since the NWDC service is
new and still changing.
Extracting the reusable engine seams also de-duplicated the engine itself
(~-66 LOC, behavior-preserving): `planning._merge_response` now backs both
pagination and fan-out aggregation; a generic `utils.Ambient[T]`
contextvar-with-scope helper collapses the per-call ambients; and
`x-ratelimit-remaining` now reports the lowest value any concurrent sub-request
saw (the quota actually left after a fan-out), fixing a latent inaccuracy in the
OGC chunker too.
Includes offline pytest-httpx coverage, a reference page, a README example, and
a demo notebook.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
0f20ada to
d9e8c7e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
dataretrieval.waterusemodule for retrieving USGS National WaterAvailability Assessment Data Companion (NWDC) water-use estimates from
https://api.water.usgs.gov/nwaa-data/data. Estimates are modeled on a HUC12grid and queryable by county, state, or hydrologic unit. This is the modern
replacement for the defunct legacy NWIS water-use service, so
nwis.get_water_usenow points callers here.It covers the same data as the R
dataRetrieval::read_waterdata_use_datagetter, but is written to the Python package's conventions rather than ported
from the R structure.
Design notes
The NWDC is a plain CSV REST service, not an OGC API Features collection —
it has no
/collectionsor/conformance, and its error envelope is{"detail": ...}rather than the OGC engine's{code, description}. So it doesnot use the high-level OGC path (
get_ogc_data, the CQL2 byte-chunker, theGeoJSON pager). It does reuse the engine's generic transport plumbing,
supplying only NWDC-specific strategies, and stays consistent with the package
where the shared pieces fit:
(DataFrame, BaseMetadata)tuple.utils._default_headers(), so the documentedAPI_USGS_PATtokenraises the NWDC rate limit just as it does for the OGC getters.
DataRetrievalErrortaxonomy (viautils._raise_for_statuswith an injected detail extractor), surfacing theNWDC
detail(e.g."Invalid model name: ...") in the message.state/county/hucselectors (mirroringngwmn/waterdata), each accepting a single value or a list. Since NWDCtakes one location per request, a multi-value selector fans out — one
request per location, run concurrently over a shared client.
start_date,end_date,time_resolution), mapped to the NWDC wire names internally.variableis comma-joined into a single GET.an RFC 8288
Link: <...>; rel="next"header (ahuc2→ 7 pages, a populousstate → 4; small queries → a single page). wateruse drives the engine's
generic
_paginatewith NWDC parse / cursor / error strategies andconcatenates the pages.
huc12_idis parsed as a string so leading zeros survive.Engine refactor
Building wateruse surfaced that it could reuse the OGC engine's transport
instead of re-implementing it — and extracting the reusable seams also
de-duplicated the engine itself. Net source ≈ −66 LOC, behavior-preserving:
planning._merge_response— one low-level "fold N responses into one"behind both pagination (
_paginate) and the chunked / fan-out aggregation(
_combine_chunk_responses), replacing two near-duplicate implementations.utils.Ambient[T]— a small generic ContextVar-with-scope class thatcollapses each per-call ambient (
_row_cap,_ogc_base_url,_dialect, thechunker's
_chunked_client) from a var + hand-written@contextmanagersetter pair into a single declaration.
x-ratelimit-remainingnow reports thelowest value any concurrent sub-request saw (the quota actually left after
a fan-out) via a shared
_lowest_remaining, instead of the last-by-index —fixing a latent inaccuracy in the OGC chunker too.
What's included
dataretrieval/wateruse.py, wired intodataretrieval/__init__.py.ogc/{engine,planning,chunking}.py,utils.py,and
waterdata/utils.py.tests/wateruse_test.py— offlinepytest-httpxcoverage: single-page parse,string
huc12_id, comma-joined variables, dropped-None params, snake_case →wire-name mapping, Link-header pagination, bare-host
normalization, shared-header reuse, state/county/huc selectors + fan-out, and
typed-error /
detailhandling; plus updates totests/waterdata_*for theengine changes.
docs/source/reference/wateruse.rst+ toctree entry.README.mdusage example and "Available Data Services" entry.demos/USGS_WaterUse_Examples.ipynb— a motivating walkthrough (whereWisconsin's public water supply comes from, and its summer demand peak).
Verification
the refactor touches;
ruff check/ruff format/mypy --strictclean.and annual resolutions, paginated results byte-identical to the unpaginated
equivalent, concurrent fan-out over multiple states, and the lowest-remaining
rate-limit header confirmed.
🤖 Generated with Claude Code