Skip to content

rasterize: sweep-test-coverage pass 2 -- NaN/Inf burns, nested GC, GPU multi-column#2256

Open
brendancol wants to merge 3 commits into
mainfrom
deep-sweep-test-coverage-rasterize-2026-05-21
Open

rasterize: sweep-test-coverage pass 2 -- NaN/Inf burns, nested GC, GPU multi-column#2256
brendancol wants to merge 3 commits into
mainfrom
deep-sweep-test-coverage-rasterize-2026-05-21

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

deep-sweep test-coverage pass 2 on rasterize. Adds 58 tests in
xrspatial/tests/test_rasterize_coverage_2026_05_21.py closing the
public-API coverage gaps left after pass-1 (2026-05-17). Source untouched.

Gaps closed

  • Cat 2 HIGH (NaN / Inf edges) -- pin +Inf / -Inf / NaN burn values
    across numpy / cupy / dask+numpy / dask+cupy on polygon, point, and
    line geometries. Inf + finite under sum stays Inf; Inf + (-Inf)
    under sum collapses to NaN; min(Inf, 1.0) and max(-Inf, 1.0)
    pick the finite value. Inf-as-bound is rejected with the same
    ValueError as NaN-as-bound (pass-1 only covered NaN bound).
  • Cat 1 MEDIUM (backend coverage) -- nested GeometryCollection
    (GC inside a GC) on all four backends. rasterize.py:1995 documents
    recursive unpacking but only flat GCs were tested. Deeply-nested
    3-levels eager test pins that the recursion depth limit is not 1 or 2.
  • Cat 1 MEDIUM (backend coverage) -- columns= (multi-column
    properties) on cupy and dask+cupy. TestMultiColumn covered only
    numpy and dask+numpy.
  • Cat 3 LOW (geometric edges) -- rectangular-pixel parity
    (resolution=(rx, ry), rx != ry) across all four backends.

Bug surfaced -- issue #2255

The NaN-burn-under-max tests revealed a cross-backend divergence:
GPU max / min merge silently suppresses NaN burn values while
CPU propagates them per IEEE arithmetic.

  • CPU: max(NaN, 1.0) returns NaN (because 1.0 > NaN is False, the
    kernel keeps the prior NaN pixel).
  • GPU: max(NaN, 1.0) returns 1.0 (the kernel inits the output buffer
    to -inf for max, +inf for min, and atomicMax/Min is NaN-suppressing
    under IEEE device semantics).

Filed as #2255. This PR pins both observables in paired
CPU vs GPU tests (test_nan_burn_overlaps_max_cpu_propagates,
test_nan_burn_overlaps_max_gpu_suppresses_nan,
test_nan_burn_single_geom_max_gpu_returns_neg_inf) so the divergence
is visible in CI until the GPU kernels are aligned with the CPU
is_first semantics.

State updates

  • .claude/sweep-test-coverage-state.csv: rasterize row updated with
    pass-2 notes, severity_max=HIGH, categories_found=1;2;3,
    issue=2255.

Test plan

  • All 58 new tests pass on a CUDA host
    (pytest xrspatial/tests/test_rasterize_coverage_2026_05_21.py)
  • Pre-existing 417 rasterize tests still pass
    (pytest xrspatial/tests/test_rasterize*.py)
  • Source files untouched (only adding tests + state CSV)
  • CI runs the new tests on CPU and GPU hosts

Adds test_rasterize_coverage_2026_05_21.py with 58 tests closing the
remaining public-API coverage gaps after pass-1 (2026-05-17):

- Cat 2 HIGH +/-Inf and NaN burn values across all four backends:
  +Inf / -Inf burn under default merge; Inf + finite under sum stays
  Inf; Inf + (-Inf) under sum collapses to NaN; min(Inf, 1.0) and
  max(-Inf, 1.0) pick the finite value; Inf-as-bound rejected with
  the same ValueError as NaN-as-bound (pass-1 only tested NaN bound).
  NaN-burn polygon, point, and line each pin the rasterized cell.
- Cat 1 MEDIUM nested GeometryCollection across all four backends.
  rasterize.py:1995 documents recursive unpacking but only flat GCs
  were tested. Deeply-nested-3-levels eager test pins that the
  recursion depth limit is not 1 or 2.
- Cat 1 MEDIUM columns= (multi-column properties) parity on cupy and
  dask+cupy (TestMultiColumn covered numpy / dask+numpy only). Pin
  three-column props on GPU so the (N, P) loop survives the kernel
  boundary.
- Cat 3 LOW rectangular-pixel parity (resolution=(rx, ry), rx != ry)
  across all four backends.

Tests surfaced a cross-backend bug: GPU max/min merge silently
suppresses NaN burn values (CPU keeps NaN per IEEE; GPU returns the
finite value or the -inf buffer init). Filed as issue #2255 and
pinned both observables in paired CPU / GPU tests so the divergence
is visible in CI until the GPU kernels are aligned.

Source untouched. All 58 new tests pass on a CUDA host; pre-existing
417 rasterize tests still pass.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: rasterize sweep-test-coverage pass 2

Summary

Tests-only PR (xrspatial/tests/test_rasterize_coverage_2026_05_21.py, +547/-0) adding 58 tests across four buckets: Inf/NaN burn values, nested GeometryCollection, GPU multi-column, rectangular pixels. State CSV updated (+1/-1). No source changes. All 58 new tests pass locally, and CI is green on macOS/Ubuntu/Windows + py3.14. The 206 pre-existing rasterize tests still pass (verified locally).

Blockers

None.

Suggestions

None.

Nits

  • test_inf_bounds_rejected (lines 208-219) matches on "must be finite", which also appears in the resolution-validation error at xrspatial/rasterize.py:3101 ("resolution must be finite and > 0"). The path the test targets is at rasterize.py:3078-3081: "Invalid bounds: all of (xmin, ymin, xmax, ymax) must be finite, ...". The regex still hits the right path today because the resolution path is not reached, but r"Invalid bounds:.*must be finite" would be safer against a future refactor that shuffles validation order. Verified manually that the test triggers the bounds path today.
  • TestNaNBurnValues references issue #2255 in a comment at line 283. Putting that reference (or a module-level ISSUE_2255 = ... constant) on the class docstring too would make it easier to find when the GPU max kernel is later aligned and the ..._gpu_suppresses_nan / ..._gpu_returns_neg_inf tests need to be inverted.
  • test_deeply_nested_gc_eager (lines 398-411) only runs against numpy. The fast-path-vs-slow-path branch in _classify_geometries is backend-agnostic, so this is not load-bearing, but one dask+numpy run would also check that nested-GC pre-classification survives the per-tile graph builder.

What looks good

  • Every test class docstring says what it pins and why a regression would be silent today. Makes the file readable a year from now.
  • The GPU/CPU NaN-max divergence is pinned as paired tests rather than hidden behind xfail, so the asymmetry shows up in CI instead of getting buried in skip output. Issue #2255 (open) tracks the source fix.
  • _materialise handles all four backends in one place (dask compute + cupy host copy), so individual tests do not leak backend assumptions.
  • Inf-as-bound rejection (line 208) and the nested-GC recursion check (line 398) close gaps that the source documented but had no test for (rasterize.py:1995 GC recursion, rasterize.py:3078 bounds validation).
  • Skip markers use pytest.param(..., marks=...) with stable ids (numpy, cupy, dask_numpy, dask_cupy), so test IDs are searchable and skip output stays readable on CPU-only hosts.

Checklist

  • Source unchanged (tests-only)
  • All four backends covered where the source supports them
  • NaN handling pinned per-backend (CPU IEEE propagation, GPU asymmetry)
  • Edge cases (Inf, -Inf, Inf+(-Inf), NaN burn, nested GC, rect pixels)
  • Dask paths use chunked fixtures (chunks=(5,5) and chunks=(3,3))
  • No premature materialization; _materialise is the single sink
  • No benchmark needed (no source changes)
  • No README change needed (no new public API)
  • Docstrings on every test explain the pin
  • Bug surfaced (#2255) is pinned in CI rather than fixed in scope

Three nit-level fixes from the post-PR review of #2256, none affecting
test intent or coverage:

- test_inf_bounds_rejected: tighten regex to "Invalid bounds:.*must be
  finite" so a future refactor that adds a different "must be finite"
  check (e.g. on resolution at rasterize.py:3101) earlier in the call
  cannot accidentally satisfy the assertion via the wrong code path.
- TestNaNBurnValues docstring: pull the issue #2255 reference up from
  the per-test comment so the GPU/CPU NaN-merge divergence is flagged
  on the class as well as on the asymmetric paired tests.
- test_deeply_nested_gc: parametrize over numpy and dask+numpy so the
  recursive GeometryCollection pre-classification check covers the
  per-tile graph builder too, not just the eager path.

All 59 tests in the file pass (was 58, +1 from the new dask_numpy
parametrization).
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (post-fix)

Pushed c9a71bd4 addressing all three nits from the prior review, plus a merge of main (c5a66009) to pull in the unrelated GeoTIFF Phase 5b/5c refactors that landed in between.

Nit disposition

  • Nit 1 (regex collision): fixed -- test_inf_bounds_rejected now matches r"Invalid bounds:.*must be finite" so it cannot be satisfied by the resolution-validation path.
  • Nit 2 (issue #2255 visibility): fixed -- TestNaNBurnValues class docstring now flags the GPU/CPU NaN-merge divergence and points to the asymmetric paired tests.
  • Nit 3 (deeply-nested GC dask coverage): fixed -- test_deeply_nested_gc_eager was renamed test_deeply_nested_gc and parametrized over numpy and dask_numpy, so the recursive _classify_geometries check now covers the per-tile graph builder too.

Verification

  • Local: pytest xrspatial/tests/test_rasterize_coverage_2026_05_21.py -- 59 passed (was 58, +1 from the new dask_numpy parametrization).
  • Pre-existing 206 rasterize tests still pass.
  • Merge of main was clean (GeoTIFF-only conflict surface, no overlap with rasterize).

Out of scope (intentional)

  • Issue #2255 (GPU max/min NaN suppression) is not fixed in this PR. The paired CPU-propagates / GPU-suppresses tests pin the current divergence so it stays visible in CI until the GPU atomic kernels can be aligned with the CPU is_first semantics. That's a source-level fix in _compile_gpu_kernels (rasterize.py:~1320) and belongs in a separate PR.

No remaining findings from my side. Ready for an outside reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant