Skip to content

feat(rust/sedona-raster-gdal,c/sedona-gdal): add rs_geotiff_tiles#957

Open
Kontinuation wants to merge 4 commits into
apache:mainfrom
Kontinuation:pr-d-rs-geotiff-tiles
Open

feat(rust/sedona-raster-gdal,c/sedona-gdal): add rs_geotiff_tiles#957
Kontinuation wants to merge 4 commits into
apache:mainfrom
Kontinuation:pr-d-rs-geotiff-tiles

Conversation

@Kontinuation

@Kontinuation Kontinuation commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

  • Added the RS_GeoTiff_Tiles table function. This function efficiently reads GeoTIFF files (or directories containing them) and loads the rasters as distinct tiles by leveraging the native internal tiling scheme of the source files.
    Usage Example:
    SELECT path, x, y, RS_MetaData(rast)
    FROM rs_geotiff_tiles('path/to/raster.tiff');
  • Added the minimal VSI directory traversal support needed by the UDTF.
  • Added SQL reference docs and a dedicated Criterion benchmark.

Dependency

  • Independent branch based on main

Testing

  • cargo test -p sedona-raster-gdal rs_geotiff_tiles
  • cargo clippy -p sedona-raster-gdal -p sedona --all-targets -- -D warnings
  • cargo bench -p sedona-raster-gdal --bench rs_geotiff_tiles --no-run
  • PATH="/home/kontinuation/workspace/github/apache/sedona-db/.venv/bin:$PATH" quarto render docs/reference/sql/rs_geotiff_tiles.qmd
    • front matter validated; example execution is currently blocked by an existing sedonadb._lib Python import mismatch in this environment

Future Work

  • Expanded Format Support: Although the function is named RS_GeoTiff_Tiles, it might actually be capable of handling a wider range of GDAL-supported raster formats. We conservatively restricted the name to GeoTIFF since our testing has exclusively focused on this format.
  • Parallel Loading: Currently, the function produces a single-partition relation, which does not distribute the data load. Enhancing this to allow for multi-partition loading is a logical next step to improve scalability.

@Kontinuation Kontinuation force-pushed the pr-d-rs-geotiff-tiles branch 2 times, most recently from b8e1154 to 8fe281c Compare June 17, 2026 13:45
@Kontinuation Kontinuation force-pushed the pr-d-rs-geotiff-tiles branch from 99e70dd to 3ddacd2 Compare June 18, 2026 02:05
@Kontinuation Kontinuation marked this pull request as ready for review June 22, 2026 02:12

@paleolimbot paleolimbot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiling implementation looks good here, although I think we should implement this as a FileFormat + ListingTable rather than a table function. We haven't done it yet, but we can write a generic table function that wraps a FileFormat + ListingTable (e.g., sd_read_format()) if we want a table function in the future.

Among other things, this will get you directory listing and partitioning (I think) for free.

Comment on lines +43 to +51
```sql
SELECT path, x, y
FROM rs_geotiff_tiles('../../../submodules/sedona-testing/data/raster/test4.tiff');
```

```sql
SELECT RS_MetaData(rast)
FROM rs_geotiff_tiles('../../../submodules/sedona-testing/data/raster/test4.tiff');
```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ensure these are user-runnable examples using a URL? If we end up getting transient failures we could do something fancy with the doc renderer that transforms sedona-testing urls into submodule paths.

Comment on lines +58 to +60
pub fn rs_geotiff_tiles_udtf() -> Arc<dyn TableFunctionImpl> {
Arc::new(RsGeoTiffTilesFunction {})
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably better as a FileFormat than a table function...DataFusion can already take care of the listing and can kick off multi-file reads on separate partitions without us asking (plus has a built-in mechanism to handle key/value options). The ExernalFormatSpec should be able to abstract away the hard parts for you.

In Python this means you can get sd.read(".../*.tif") and sd.read("some_dir", format="tiff") for free.

Comment on lines +182 to +183
// TODO: allow split paths to load into multiple partitions to enable parallelism.
Partitioning::UnknownPartitioning(1),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes for free with ExternalFormatSpec I believe (but worth checking)

arrow-buffer = { workspace = true }
arrow-schema = { workspace = true }
async-trait = { workspace = true }
datafusion = { workspace = true, default_features = false }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can avoid the full datafusion crate as a dependency that would be helpful for compilation order

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants