feat(rust/sedona-raster-gdal,c/sedona-gdal): add rs_geotiff_tiles#957
feat(rust/sedona-raster-gdal,c/sedona-gdal): add rs_geotiff_tiles#957Kontinuation wants to merge 4 commits into
Conversation
b8e1154 to
8fe281c
Compare
99e70dd to
3ddacd2
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
The tiling implementation looks good here, although I think we should implement this as a FileFormat + ListingTable rather than a table function. We haven't done it yet, but we can write a generic table function that wraps a FileFormat + ListingTable (e.g., sd_read_format()) if we want a table function in the future.
Among other things, this will get you directory listing and partitioning (I think) for free.
| ```sql | ||
| SELECT path, x, y | ||
| FROM rs_geotiff_tiles('../../../submodules/sedona-testing/data/raster/test4.tiff'); | ||
| ``` | ||
|
|
||
| ```sql | ||
| SELECT RS_MetaData(rast) | ||
| FROM rs_geotiff_tiles('../../../submodules/sedona-testing/data/raster/test4.tiff'); | ||
| ``` |
There was a problem hiding this comment.
Can you ensure these are user-runnable examples using a URL? If we end up getting transient failures we could do something fancy with the doc renderer that transforms sedona-testing urls into submodule paths.
| pub fn rs_geotiff_tiles_udtf() -> Arc<dyn TableFunctionImpl> { | ||
| Arc::new(RsGeoTiffTilesFunction {}) | ||
| } |
There was a problem hiding this comment.
This is probably better as a FileFormat than a table function...DataFusion can already take care of the listing and can kick off multi-file reads on separate partitions without us asking (plus has a built-in mechanism to handle key/value options). The ExernalFormatSpec should be able to abstract away the hard parts for you.
In Python this means you can get sd.read(".../*.tif") and sd.read("some_dir", format="tiff") for free.
| // TODO: allow split paths to load into multiple partitions to enable parallelism. | ||
| Partitioning::UnknownPartitioning(1), |
There was a problem hiding this comment.
This comes for free with ExternalFormatSpec I believe (but worth checking)
| arrow-buffer = { workspace = true } | ||
| arrow-schema = { workspace = true } | ||
| async-trait = { workspace = true } | ||
| datafusion = { workspace = true, default_features = false } |
There was a problem hiding this comment.
If we can avoid the full datafusion crate as a dependency that would be helpful for compilation order
Summary
RS_GeoTiff_Tilestable function. This function efficiently reads GeoTIFF files (or directories containing them) and loads the rasters as distinct tiles by leveraging the native internal tiling scheme of the source files.Usage Example:
Dependency
mainTesting
cargo test -p sedona-raster-gdal rs_geotiff_tilescargo clippy -p sedona-raster-gdal -p sedona --all-targets -- -D warningscargo bench -p sedona-raster-gdal --bench rs_geotiff_tiles --no-runPATH="/home/kontinuation/workspace/github/apache/sedona-db/.venv/bin:$PATH" quarto render docs/reference/sql/rs_geotiff_tiles.qmdsedonadb._libPython import mismatch in this environmentFuture Work
RS_GeoTiff_Tiles, it might actually be capable of handling a wider range of GDAL-supported raster formats. We conservatively restricted the name to GeoTIFF since our testing has exclusively focused on this format.