Skip to content

Refactor GeoTIFF Phase 5c-write: extract _write_layout.py from _writer.py#2249

Merged
brendancol merged 2 commits into
mainfrom
issue-2248-write-layout-extraction
May 21, 2026
Merged

Refactor GeoTIFF Phase 5c-write: extract _write_layout.py from _writer.py#2249
brendancol merged 2 commits into
mainfrom
issue-2248-write-layout-extraction

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2248
Part of #2211

Summary

  • Moves IFD encoding primitives and BigTIFF / COG layout planners out of xrspatial/geotiff/_writer.py into a new xrspatial/geotiff/_write_layout.py.
  • Pixel encoding kernels (strip/tile compression, photometric/predictor encode) and the top-level _write / _write_streaming entry points stay in _writer.py.
  • _writer.py drops about 580 lines.

Moved names

_float_to_rational, _serialize_tag_value, _pack_tag_value, _build_ifd, _assemble_tiff, _promote_offsets_to_long8, _assemble_standard_layout, _assemble_cog_layout, _compute_classic_ifd_overhead, _should_use_bigtiff_streaming, plus the BO byte-order constant and the _BIGTIFF_OFFSET_TAGS frozenset.

Compatibility

  • _writer.py re-exports every moved name. Existing imports (_writers/eager.py, _writers/gpu.py, _gpu_decode.py, and the test suite that imports from xrspatial.geotiff._writer) keep working unchanged.
  • _assemble_tiff resolves the layout helpers and _resolve_photometric / _OVERRIDABLE_AUTO_TAG_IDS / _DANGEROUS_EXTRA_TAG_IDS through the _writer module at call time so tests that monkeypatch _writer.* (e.g. test_eager_bigtiff_overhead_exact_1905.py) keep their pre-extraction semantics.
  • No public API change.

Test plan

  • pytest xrspatial/geotiff/tests/ passes locally (5033 passed, 68 skipped). One pre-existing failure unrelated to this change (test_lowlevel_write_pushdown_2138.py::test_write_vs_to_geotiff_byte_parity_uint8[lz4], reproduces on main) was deselected.
  • Module imports cleanly: from xrspatial.geotiff._writer import _assemble_tiff, _build_ifd, _compute_classic_ifd_overhead still resolves to the moved implementations.
  • CI green.

Move TIFF/BigTIFF IFD encoding primitives and the BigTIFF / COG layout
planners out of _writer.py into a new _write_layout.py. The pixel
encoding kernels (strip/tile compression, photometric/predictor encode)
and the top-level write entry points stay in _writer.py.

Moved: _float_to_rational, _serialize_tag_value, _pack_tag_value,
_build_ifd, _assemble_tiff, _promote_offsets_to_long8,
_assemble_standard_layout, _assemble_cog_layout,
_compute_classic_ifd_overhead, _should_use_bigtiff_streaming, the BO
byte-order constant, and the _BIGTIFF_OFFSET_TAGS frozenset.

_writer.py re-exports every moved name so call sites in
_writers/eager.py, _writers/gpu.py, _gpu_decode.py, and the test suite
keep working. _assemble_tiff resolves the layout helpers and
_resolve_photometric / _OVERRIDABLE_AUTO_TAG_IDS /
_DANGEROUS_EXTRA_TAG_IDS through the _writer module at call time so
monkeypatches on _writer.* (e.g. test_eager_bigtiff_overhead_exact_1905)
still take effect.

_writer.py drops about 580 lines. Behavior-neutral; the geotiff test
suite passes.

Part of #2211.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Extract _write_layout.py from _writer.py

Blockers

None.

Suggestions

  1. _write_layout.py:497-504 - _assemble_tiff reaches back into _writer for six names every call so test monkeypatches on _writer.* still take effect. The contract is invisible unless you read the comment block, and it couples the new module back to the one it was extracted from. Two options: (a) call this out on the module docstring so nobody inlines _compute_classic_ifd_overhead here in a future cleanup, or (b) add a unit test that monkeypatches one of the other indirected names (e.g. _writer._promote_offsets_to_long8 or _writer._assemble_standard_layout) and asserts the override fires through _assemble_tiff. Right now only the _compute_classic_ifd_overhead path is exercised by test_eager_bigtiff_overhead_exact_1905.

  2. _write_layout.py:194-200 - _BIGTIFF_OFFSET_TAGS is consumed only by _promote_offsets_to_long8 and has no external importers. The # noqa: F401 re-export is harmless, but the constant could come off the re-export list; the public surface is _promote_offsets_to_long8. Not blocking.

Nits

  1. _writer.py:42-50 - Dropped LONG8 from the _dtypes import block but left TIFF_TYPE_SIZES. It was already unused on main, so this PR didn't introduce the dead import, but cleaning it up here would save the next reader a double-take.

  2. _write_layout.py:1-12 - Module docstring says the helpers feed "the eager and streaming writers." True, but _write_streaming calls the lower-level helpers (_build_ifd, _compute_classic_ifd_overhead, _should_use_bigtiff_streaming, _promote_offsets_to_long8) directly and never goes through _assemble_tiff. A one-line note that "the eager writer calls _assemble_tiff; the streaming writer composes the lower-level helpers" would make the split clearer.

  3. _write_layout.py:497 - from . import _writer as _writer_mod works but is unusual. A short docstring sentence on _assemble_tiff (something like "this function is owned by _writer.py semantically; it lives here only to keep IFD-encoding helpers co-located") would help reviewers who land in the file directly.

What looks good

  • Mechanical and behaviour-neutral. git diff --stat reports 611 deletions and 29 insertions in _writer.py, inside the issue's 600-900-line target.
  • Re-exports cover every existing import path: _writers/eager.py, _writers/gpu.py, _gpu_decode.py, plus the test imports in tests/test_features.py, tests/test_assemble_layout_no_bytes_copy_1756.py, tests/test_eager_bigtiff_overhead_exact_1905.py, tests/test_predictor3_int_dtype_1933.py, tests/test_predictor3_int_dtype_gpu_1933.py.
  • The monkeypatch-preservation indirection in _assemble_tiff was the pragmatic option; the alternative was patching every test call site, which would balloon the PR.
  • 5033 geotiff tests pass. The single pre-existing lz4 failure also reproduces on main and is unrelated.

Checklist

  • No public API change.
  • Re-exports preserve every existing import path.
  • Eager and streaming write paths covered by the geotiff suite.
  • Monkeypatch contract on _writer.* preserved.
  • No new public functions, so the README feature matrix is unaffected.
  • No new docs entries needed (private module).

- Drop the unused _BIGTIFF_OFFSET_TAGS re-export from _writer.py
  (only _promote_offsets_to_long8 consumes it; no external importer).
- Drop the pre-existing dead TIFF_TYPE_SIZES import from _writer.py
  while we are touching the import block.
- Expand the _write_layout.py module docstring to spell out which
  helpers the eager path uses (via _assemble_tiff) versus the
  streaming path (which calls the lower-level helpers directly), and
  document the _writer-module indirection contract that _assemble_tiff
  relies on.
- Add a paragraph to _assemble_tiff's docstring explaining that the
  function lives in _write_layout.py only to keep IFD-encoding helpers
  co-located; the function is still owned by _writer.py semantically.
- Add tests/test_write_layout_monkeypatch_contract_2248.py covering
  the remaining indirected helpers (_promote_offsets_to_long8,
  _assemble_standard_layout, _assemble_cog_layout,
  _resolve_photometric). The existing 1905 test covers only
  _compute_classic_ifd_overhead; the new test catches any future
  refactor that inlines one of the other four at the call site.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): #2249

Blockers

None.

Suggestions

None.

Nits

  1. tests/test_write_layout_monkeypatch_contract_2248.py:60-62 - The _wrapped sentinel records args and kwargs but the calls list is only used for a truthiness check. Could drop the recording and use a simple nonlocal called = True flag. Not blocking; the captured arguments may help if the test ever needs to assert against the call shape.

What looks good (delta from first pass)

  • All actionable items from the first review applied:
    • _BIGTIFF_OFFSET_TAGS removed from the _writer.py re-export (_writer.py:89-101).
    • TIFF_TYPE_SIZES dead import dropped (_writer.py:42-49).
    • _write_layout.py module docstring now spells out eager vs. streaming consumers and documents the _writer-module indirection contract.
    • _assemble_tiff docstring explains the ownership / co-location rationale.
  • New regression test (tests/test_write_layout_monkeypatch_contract_2248.py) parametrises across the four previously-uncovered indirected names (_promote_offsets_to_long8, _assemble_standard_layout, _assemble_cog_layout, _resolve_photometric). It picks the right kwargs per helper (cog=True, overview_levels=[2] for COG, bigtiff=True for the BigTIFF-only promoter) so each helper actually fires on its subtest. With the existing test_eager_bigtiff_overhead_exact_1905, all five indirected names are locked down.
  • 5037 tests pass locally. The lz4 pre-existing failure on test_lowlevel_write_pushdown_2138.py still reproduces on main and is unrelated.

Checklist

  • No public API change.
  • Re-exports preserve every existing import path.
  • Eager and streaming write paths covered by the geotiff suite.
  • Monkeypatch contract on _writer.* covered by tests for all five indirected names.
  • No new public functions, so the README feature matrix is unaffected.
  • No new docs entries needed (private module).

Verdict: clean / nits-only. The single nit is cosmetic; not worth another round.

@brendancol brendancol merged commit f2c3964 into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor GeoTIFF Phase 5c-write: extract _write_layout.py from _writer.py (PR-I of #2211)

1 participant