Refactor GeoTIFF Phase 5c-write: extract _write_layout.py from _writer.py#2249
Conversation
Move TIFF/BigTIFF IFD encoding primitives and the BigTIFF / COG layout planners out of _writer.py into a new _write_layout.py. The pixel encoding kernels (strip/tile compression, photometric/predictor encode) and the top-level write entry points stay in _writer.py. Moved: _float_to_rational, _serialize_tag_value, _pack_tag_value, _build_ifd, _assemble_tiff, _promote_offsets_to_long8, _assemble_standard_layout, _assemble_cog_layout, _compute_classic_ifd_overhead, _should_use_bigtiff_streaming, the BO byte-order constant, and the _BIGTIFF_OFFSET_TAGS frozenset. _writer.py re-exports every moved name so call sites in _writers/eager.py, _writers/gpu.py, _gpu_decode.py, and the test suite keep working. _assemble_tiff resolves the layout helpers and _resolve_photometric / _OVERRIDABLE_AUTO_TAG_IDS / _DANGEROUS_EXTRA_TAG_IDS through the _writer module at call time so monkeypatches on _writer.* (e.g. test_eager_bigtiff_overhead_exact_1905) still take effect. _writer.py drops about 580 lines. Behavior-neutral; the geotiff test suite passes. Part of #2211.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Extract _write_layout.py from _writer.py
Blockers
None.
Suggestions
-
_write_layout.py:497-504-_assemble_tiffreaches back into_writerfor six names every call so test monkeypatches on_writer.*still take effect. The contract is invisible unless you read the comment block, and it couples the new module back to the one it was extracted from. Two options: (a) call this out on the module docstring so nobody inlines_compute_classic_ifd_overheadhere in a future cleanup, or (b) add a unit test that monkeypatches one of the other indirected names (e.g._writer._promote_offsets_to_long8or_writer._assemble_standard_layout) and asserts the override fires through_assemble_tiff. Right now only the_compute_classic_ifd_overheadpath is exercised bytest_eager_bigtiff_overhead_exact_1905. -
_write_layout.py:194-200-_BIGTIFF_OFFSET_TAGSis consumed only by_promote_offsets_to_long8and has no external importers. The# noqa: F401re-export is harmless, but the constant could come off the re-export list; the public surface is_promote_offsets_to_long8. Not blocking.
Nits
-
_writer.py:42-50- DroppedLONG8from the_dtypesimport block but leftTIFF_TYPE_SIZES. It was already unused on main, so this PR didn't introduce the dead import, but cleaning it up here would save the next reader a double-take. -
_write_layout.py:1-12- Module docstring says the helpers feed "the eager and streaming writers." True, but_write_streamingcalls the lower-level helpers (_build_ifd,_compute_classic_ifd_overhead,_should_use_bigtiff_streaming,_promote_offsets_to_long8) directly and never goes through_assemble_tiff. A one-line note that "the eager writer calls_assemble_tiff; the streaming writer composes the lower-level helpers" would make the split clearer. -
_write_layout.py:497-from . import _writer as _writer_modworks but is unusual. A short docstring sentence on_assemble_tiff(something like "this function is owned by_writer.pysemantically; it lives here only to keep IFD-encoding helpers co-located") would help reviewers who land in the file directly.
What looks good
- Mechanical and behaviour-neutral.
git diff --statreports 611 deletions and 29 insertions in_writer.py, inside the issue's 600-900-line target. - Re-exports cover every existing import path:
_writers/eager.py,_writers/gpu.py,_gpu_decode.py, plus the test imports intests/test_features.py,tests/test_assemble_layout_no_bytes_copy_1756.py,tests/test_eager_bigtiff_overhead_exact_1905.py,tests/test_predictor3_int_dtype_1933.py,tests/test_predictor3_int_dtype_gpu_1933.py. - The monkeypatch-preservation indirection in
_assemble_tiffwas the pragmatic option; the alternative was patching every test call site, which would balloon the PR. - 5033 geotiff tests pass. The single pre-existing
lz4failure also reproduces onmainand is unrelated.
Checklist
- No public API change.
- Re-exports preserve every existing import path.
- Eager and streaming write paths covered by the geotiff suite.
- Monkeypatch contract on
_writer.*preserved. - No new public functions, so the README feature matrix is unaffected.
- No new docs entries needed (private module).
- Drop the unused _BIGTIFF_OFFSET_TAGS re-export from _writer.py (only _promote_offsets_to_long8 consumes it; no external importer). - Drop the pre-existing dead TIFF_TYPE_SIZES import from _writer.py while we are touching the import block. - Expand the _write_layout.py module docstring to spell out which helpers the eager path uses (via _assemble_tiff) versus the streaming path (which calls the lower-level helpers directly), and document the _writer-module indirection contract that _assemble_tiff relies on. - Add a paragraph to _assemble_tiff's docstring explaining that the function lives in _write_layout.py only to keep IFD-encoding helpers co-located; the function is still owned by _writer.py semantically. - Add tests/test_write_layout_monkeypatch_contract_2248.py covering the remaining indirected helpers (_promote_offsets_to_long8, _assemble_standard_layout, _assemble_cog_layout, _resolve_photometric). The existing 1905 test covers only _compute_classic_ifd_overhead; the new test catches any future refactor that inlines one of the other four at the call site.
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): #2249
Blockers
None.
Suggestions
None.
Nits
tests/test_write_layout_monkeypatch_contract_2248.py:60-62- The_wrappedsentinel recordsargsandkwargsbut thecallslist is only used for a truthiness check. Could drop the recording and use a simplenonlocal called = Trueflag. Not blocking; the captured arguments may help if the test ever needs to assert against the call shape.
What looks good (delta from first pass)
- All actionable items from the first review applied:
_BIGTIFF_OFFSET_TAGSremoved from the_writer.pyre-export (_writer.py:89-101).TIFF_TYPE_SIZESdead import dropped (_writer.py:42-49)._write_layout.pymodule docstring now spells out eager vs. streaming consumers and documents the_writer-module indirection contract._assemble_tiffdocstring explains the ownership / co-location rationale.
- New regression test (
tests/test_write_layout_monkeypatch_contract_2248.py) parametrises across the four previously-uncovered indirected names (_promote_offsets_to_long8,_assemble_standard_layout,_assemble_cog_layout,_resolve_photometric). It picks the right kwargs per helper (cog=True, overview_levels=[2]for COG,bigtiff=Truefor the BigTIFF-only promoter) so each helper actually fires on its subtest. With the existingtest_eager_bigtiff_overhead_exact_1905, all five indirected names are locked down. - 5037 tests pass locally. The
lz4pre-existing failure ontest_lowlevel_write_pushdown_2138.pystill reproduces on main and is unrelated.
Checklist
- No public API change.
- Re-exports preserve every existing import path.
- Eager and streaming write paths covered by the geotiff suite.
- Monkeypatch contract on
_writer.*covered by tests for all five indirected names. - No new public functions, so the README feature matrix is unaffected.
- No new docs entries needed (private module).
Verdict: clean / nits-only. The single nit is cosmetic; not worth another round.
Closes #2248
Part of #2211
Summary
xrspatial/geotiff/_writer.pyinto a newxrspatial/geotiff/_write_layout.py._write/_write_streamingentry points stay in_writer.py._writer.pydrops about 580 lines.Moved names
_float_to_rational,_serialize_tag_value,_pack_tag_value,_build_ifd,_assemble_tiff,_promote_offsets_to_long8,_assemble_standard_layout,_assemble_cog_layout,_compute_classic_ifd_overhead,_should_use_bigtiff_streaming, plus theBObyte-order constant and the_BIGTIFF_OFFSET_TAGSfrozenset.Compatibility
_writer.pyre-exports every moved name. Existing imports (_writers/eager.py,_writers/gpu.py,_gpu_decode.py, and the test suite that imports fromxrspatial.geotiff._writer) keep working unchanged._assemble_tiffresolves the layout helpers and_resolve_photometric/_OVERRIDABLE_AUTO_TAG_IDS/_DANGEROUS_EXTRA_TAG_IDSthrough the_writermodule at call time so tests that monkeypatch_writer.*(e.g.test_eager_bigtiff_overhead_exact_1905.py) keep their pre-extraction semantics.Test plan
pytest xrspatial/geotiff/tests/passes locally (5033 passed, 68 skipped). One pre-existing failure unrelated to this change (test_lowlevel_write_pushdown_2138.py::test_write_vs_to_geotiff_byte_parity_uint8[lz4], reproduces onmain) was deselected.from xrspatial.geotiff._writer import _assemble_tiff, _build_ifd, _compute_classic_ifd_overheadstill resolves to the moved implementations.