Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
- Exclude `.claude/worktrees/` from searches and edits; it contains stale worktrees that pollute results.

## Example Notebooks
- `demos/*.ipynb` — top-level Water Data tour: `USGS_WaterData_Introduction_Examples.ipynb` is the entry point; `_ContinuousData_`, `_DailyStatistics_`, `_DiscreteSamples_`, `_ReferenceLists_` cover individual collections; `WaterData_demo.ipynb`, `peak_streamflow_trends.ipynb`, and `R Python Vignette equivalents.ipynb` are standalone walkthroughs.
- `demos/*.ipynb` — top-level Water Data tour: `USGS_WaterData_Introduction_Examples.ipynb` is the entry point; `_ContinuousData_`, `_DailyStatistics_`, `_DiscreteSamples_`, `_ReferenceLists_` cover individual collections; `WaterData_demo.ipynb`, `peak_streamflow_trends.ipynb`, `USGS_WaterUse_Examples.ipynb` (NWDC water-use data via `wateruse.get_wateruse`), and `R Python Vignette equivalents.ipynb` are standalone walkthroughs.
- `demos/hydroshare/*.ipynb` — per-service HydroShare examples (NLDI, NWIS WaterUse, and Water Data DailyValues / GroundwaterLevels / Measurements / ParameterCodes / Peaks / Ratings / Samples / SiteInfo / SiteInventory / Statistics / UnitValues). Mirror these when adding examples for a new collection.
- `demos/nwqn_data_pull/` — non-notebook example: a lithops/Docker batch pipeline (`retrieve_nwqn_samples.py`, `retrieve_nwqn_streamflow.py`) with its own `README.md`.
- Any `Untitled*.ipynb`, `*_test.ipynb`, or notebooks not listed here are untracked local scratch; ignore them.
Expand Down
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ sites, metadata = ngwmn.get_sites(state='Wisconsin')

print(f"Found {len(sites)} NGWMN sites in Wisconsin")

# Pull water levels from the first twenty sites over a time window.
# Pull water levels from the first twenty sites over a time window.
water_levels, metadata = ngwmn.get_water_level(
monitoring_location_id=sites['monitoring_location_id'][:20],
datetime=['2022-01-01', '2024-01-01']
Expand Down Expand Up @@ -192,6 +192,31 @@ flowlines = nldi.get_flowlines(
print(f"Found {len(flowlines)} upstream tributaries within 50km")
```

### Water Use (NWDC)

Retrieve modeled water-use estimates from the National Water Availability
Assessment Data Companion:

```python
from dataretrieval import wateruse

# Monthly public-supply withdrawals for Rhode Island, split into
# groundwater and surface-water sources (returns a DataFrame and metadata).
df, metadata = wateruse.get_wateruse(
model='wu-public-supply-wd',
variable=['pswdtot', 'pswdgw', 'pswdsw'],
state='RI', # name/postal/FIPS; pass a list to fan out over several areas
start_date='2020-01',
time_resolution='monthly',
)

print(f"Retrieved {len(df)} records across {df['huc12_id'].nunique()} watersheds")

# Aggregate the HUC12 grid to a statewide monthly total (million gallons/day)
statewide = df.groupby('year_month')['pswdtot_mgd'].sum()
print(statewide.head())
```

## Available Data Services

### Modern USGS Water Data APIs (Recommended) — `dataretrieval.waterdata`
Expand Down Expand Up @@ -232,6 +257,13 @@ print(f"Found {len(flowlines)} upstream tributaries within 50km")
- `get_features`: Find monitoring sites, dams, and other features along the network
- `get_features_by_data_source`: Features from a specific data source

### Water Use (NWDC)
- **Public supply**: Modeled public-supply withdrawals and consumptive use
- **Irrigation**: Modeled irrigation withdrawals and consumptive use
- **Thermoelectric**: Modeled thermoelectric-power water use
- **HUC12 estimates**: National coverage on a 12-digit hydrologic-unit grid,
summarizable to counties, states, or coarser hydrologic units

## More Examples

Explore additional examples in the
Expand Down
5 changes: 4 additions & 1 deletion dataretrieval/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
df, meta = nwis.get_dv(sites="05427718")
Available service modules: ``waterdata``, ``wqp`` (Water Quality Portal),
``nldi``, ``streamstats``, and the deprecated ``nwis``.
``wateruse`` (NWDC water-use data), ``nldi``, ``streamstats``, and the
deprecated ``nwis``.
``nldi`` requires geopandas (``pip install dataretrieval[nldi]``) and is
imported on demand: ``from dataretrieval import nldi``.
Expand Down Expand Up @@ -62,6 +63,7 @@
streamstats,
utils,
waterdata,
wateruse,
wqp,
)

Expand All @@ -72,6 +74,7 @@
"streamstats",
"utils",
"waterdata",
"wateruse",
"wqp",
# error taxonomy (canonical home: ``dataretrieval.exceptions``), re-exported
# so callers can ``except dataretrieval.DataRetrievalError``
Expand Down
13 changes: 11 additions & 2 deletions dataretrieval/nwis.py
Original file line number Diff line number Diff line change
Expand Up @@ -741,8 +741,17 @@ def get_pmcodes(**kwargs: Any) -> NoReturn:


def get_water_use(**kwargs: Any) -> NoReturn:
"""Defunct: no current replacement."""
raise NameError("`nwis.get_water_use` is defunct.")
"""Defunct: use ``dataretrieval.wateruse.get_wateruse`` instead.
The legacy NWIS water-use service has been retired. Modeled water-use
estimates are now served by the National Water Availability Assessment Data
Companion (NWDC); retrieve them with
:func:`dataretrieval.wateruse.get_wateruse`.
"""
raise NameError(
"`nwis.get_water_use` is defunct; use "
"`dataretrieval.wateruse.get_wateruse` instead."
)


@_deprecated
Expand Down
51 changes: 10 additions & 41 deletions dataretrieval/ogc/chunking.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,14 @@
import functools
import os
from collections.abc import Callable, Iterator
from contextlib import contextmanager
from contextvars import ContextVar, copy_context
from contextvars import copy_context
from typing import Any, cast

import httpx
import pandas as pd
from anyio.from_thread import start_blocking_portal

from dataretrieval.utils import HTTPX_DEFAULTS
from dataretrieval.utils import HTTPX_DEFAULTS, Ambient

from . import progress as _progress
from .interruptions import (
Expand Down Expand Up @@ -106,9 +105,6 @@
_OGC_URL_BYTE_LIMIT = 8000


# Response header USGS uses to advertise remaining hourly quota.
_QUOTA_HEADER = "x-ratelimit-remaining"

# Fan-out concurrency cap, read at call time (not import) so test
# ``monkeypatch.setenv`` applies. Value grammar in :func:`_read_concurrency_env`;
# the concurrency model is in the module docstring.
Expand Down Expand Up @@ -152,38 +148,11 @@ def _read_concurrency_env() -> int | None:
return value


# Shared per-call ``httpx.AsyncClient``, published via :func:`_publish`
# during ``ChunkedCall._run`` so paginated-loop helpers (``_walk_pages``)
# reuse the same connection pool across every sub-request. ``None``
# outside a chunked call — paginated helpers then open their own
# short-lived client.
_chunked_client: ContextVar[httpx.AsyncClient | None] = ContextVar(
"_chunked_client", default=None
)


@contextmanager
def _publish(client: httpx.AsyncClient) -> Iterator[None]:
"""
Publish ``client`` on the ``_chunked_client`` ContextVar so the
paginated-loop helpers can borrow it via :func:`get_active_client`
for the duration of the ``with`` block.

Parameters
----------
client : httpx.AsyncClient
The client to publish.

Yields
------
None
Yields once, for the duration of the bind.
"""
token = _chunked_client.set(client)
try:
yield
finally:
_chunked_client.reset(token)
# Shared per-call ``httpx.AsyncClient``, scoped via ``with _chunked_client(c):``
# during ``ChunkedCall._run`` so paginated-loop helpers (``_walk_pages``) reuse
# the same connection pool across every sub-request. ``None`` outside a chunked
# call — paginated helpers then open their own short-lived client.
_chunked_client: Ambient[httpx.AsyncClient | None] = Ambient("_chunked_client", None)


def get_active_client() -> httpx.AsyncClient | None:
Expand All @@ -197,8 +166,8 @@ def get_active_client() -> httpx.AsyncClient | None:
Returns
-------
httpx.AsyncClient or None
The client published via :func:`_publish` if currently inside a
:class:`ChunkedCall` run; ``None`` otherwise.
The client scoped via ``with _chunked_client(...)`` if currently inside
a :class:`ChunkedCall` run; ``None`` otherwise.
"""
return _chunked_client.get()

Expand Down Expand Up @@ -541,7 +510,7 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
)

async with httpx.AsyncClient(limits=limits, **HTTPX_DEFAULTS) as client:
with _publish(client):
with _chunked_client(client):
reporter = _progress.current()
if reporter is not None:
reporter.set_chunks(self.plan.total)
Expand Down
Loading