feat(rust/sedona-raster-functions): add RS_Value for spatial 2d rasters by james-willis · Pull Request #974 · apache/sedona-db

james-willis · 2026-06-17T21:03:27Z

No description provided.

Add RS_Value to sample a raster pixel value at a point or 1-based grid cell (2-D). Reads materialised InDb pixel bytes directly from the band's NdBuffer via stride-offset math and marks the UDF needs_pixels so the planner injects RS_EnsureLoaded — no GDAL dependency. Returns null for out-of-bounds locations and nodata pixels; errors on non-2-D bands.

Add RS_Value cases (grid, point, and point+band) to the consolidated native-raster-functions Criterion bench, alongside the other RS_* kernels.

…d band handling - Floor (not truncate toward zero) the inverse transform so a point just outside the top/left edge is out of bounds rather than sampling an edge pixel. - Require a 2-D spatial (y, x) band via is_spatial_2d, not any two-axis band. - Propagate a NULL band argument to a NULL result; reject band 0/negative rather than coercing to band 1. - Document why the pixel read uses the lossless f64 conversion. - Add tests: floor edge case, NaN nodata, band 0, and the point CRS-mismatch / non-point-geometry error paths.

…lue tests Drop the raster_con fixture + query_value helper; sample the inline RS_Example() raster and assert via assert_query_result, matching the other RS_ function tests. (The RS_EnsureLoaded planner path is covered against a real OutDb raster by test_rs_ensureloaded.)

paleolimbot

Thank you! I took a first pass...I am not sure how far you want to go in optimizing this now, but there are a few things I think you'll want to consider even before we take a general optimization pass through all the functions.

In most of our functions we assume that whatever operation we're doing is sufficiently expensive that the per-iteration overhead of things like null checks and input validation are not measurable. Extracting a value out of a buffer is a nanosecond-scale operation depending mostly on whether it's cached or not, so per-item null checks (and especially the branching introduced by them) in this case are something that a production-grade raster operator worth its salt needs to avoid.

We can (probably) improve this by composing operations such that the actual sample loop has no branches.

Validate raster band selection and build a Vec<(&[u8], AffineMatrix)>. In the possibly common case of a scalar raster, this check only has to happen once (instead of once per point).
For the geometry case, validate the point geometry and extract the x, y as floats. You can use WkbHeader for this.
Resolve whether there needs to be a CRS transform (I think we have a "are all these CRSes equal" function, which could be better optimized to check if a vector of CRSes has repeated values). You may want to punt on the CRSes not equal front for now.
Build a "selection vector" to figure out which values of the geometry/row+col items you need to consider. You can start by intersecting the NullBuffers, and then you also need to exclude empty points. A selection vector is often Vec<usize> (DuckDB uses this trick to unify dictionary, run-end-encoding, and execution of scalar functions after a filter).
Resolve the logical row/col in a branchless loop (separate cases for scalar vs. array raster)
When you support views, make the logical indices physical ones
Extract from the buffer in a branchless loop.

A good way to know you're on the right track is to benchmark sd.sql("SELECT RS_Value($1, x, y)", params=(Raster.from_numpy(arr),)againstarr.take()`, particularly after #999.

(Definitely prototype that suggestion before taking on the complexity!)

paleolimbot · 2026-06-23T14:01:51Z

+    - name: colX
+      type: int
+      description: Column index (1-based).
+    - name: rowY
+      type: int
+      description: Row index (1-based).


Indexing is typically rows, cols (e.g., numpy, R) instead of cols, rows.

punted the col/row version of the function for now

But the signature here is based on trying to match postgis as much as possible:

Here are PostGIS's ST_Value (raster) signatures, straight from the current docs:

ST_Value(raster rast, geometry pt, boolean exclude_nodata_value=true)

ST_Value(raster rast, integer band, geometry pt, boolean exclude_nodata_value=true, text
resample='nearest')

ST_Value(raster rast, integer x, integer y, boolean exclude_nodata_value=true)

ST_Value(raster rast, integer band, integer x, integer y, boolean exclude_nodata_value=true)

also sedona spark: https://sedona.apache.org/latest/api/sql/Raster-Operators/RS_Value/

Sure, but nobody asked me to review those 🙂 . Compared to GDAL and xarray the PostGIS and Sedona Spark APIs are not particularly popular...part of porting that to SedonaDB is to make it fast, but if anybody is ever going to use this we're also going to have to make it less awkward.

Would it be better to keep st functions in sync between sedona and sedona_db? I understand the concern and bad decisions in the past on the orders of params, but if we'd make them different in signatures, users would need to change their SQL to adapt in between if they ever want to run on both engines.

…ple via WkbHeader - POINT EMPTY now samples to NULL instead of erroring. - Extract the point x/y via a header-only WkbHeader parse (no full geometry walk), which also yields NaN coords for empties and lets the empty check short-circuit before any CRS resolution. - Hold the cast Int32 arguments as ArrayRefs and borrow typed views instead of cloning the typed arrays. - Note that the f64->i64 index cast saturates (never panics).

…xample

…e pixel byte offset The (row, col) -> byte-offset computation multiplied and added i64 strides and offset unchecked, which could panic on overflow in debug builds given a pathological stride/offset. Use checked_mul/checked_add so a bad layout surfaces as an error instead of a panic.

…riant Ship RS_Value as point-only for now; remove the (raster, colX, rowY[, band]) grid kernels, their docs/examples, Python tests, and benchmark. The grid variant will be introduced as its own function later (its indexing order conflicts with PostGIS ST_Value). Pixel sampling by 0-based (col, row) is retained internally via sample_pixel and still covered by unit tests.

james-willis · 2026-06-23T18:44:59Z

My inclination is to punt on the optimization work in order to add functionality. Also will eventually add a benchmark for these functions

paleolimbot · 2026-06-23T19:48:52Z

My inclination is to punt on the optimization work in order to add functionality. Also will eventually add a benchmark for these functions

I think you should benchmark them now and put in a reasonable effort to optimize them now...we know we need this function to be fast and it will never be easier to get both of our eyes on it. The patterns you come up with for this function will help you with all of the other functions that come after.

…sampling Add a scalar-raster (one raster, many points) RS_Value case alongside the existing array-raster cases. The scalar case is the one where per-point band / nodata / buffer / CRS resolution can be hoisted out of the sample loop, so it is the baseline for the upcoming optimization. Today it measures the same as the array-raster case (~172 ns/point), confirming the hoist isn't happening.

…alue For a constant (scalar) raster sampled with the default band — the common RS_Value(raster_expr, point_column) shape — resolve the band buffer, affine transform, and CRS once for the batch, then drive a selection vector + tight sample loop instead of re-resolving all of that per point. The general array-raster path (a distinct raster per row, which can't hoist) is unchanged. Factors the per-pixel read into read_pixel (shared with sample_pixel) and exposes the executor's geom WKB/CRS accessor pub(crate). ~3x faster on the scalar-raster benchmark (172 -> 59 ns/point); array-raster unchanged.

…sion Resolve once whether a column-level point CRS needs reprojecting into the raster CRS (column_reproject_decision), then skip the per-point reproject_point call — and its crs_equals, whose to_authority_code() allocates a String every point. Profiling put that at ~15 ns/point (~37% of the equal-CRS sample path), uniform across CRS types (lng/lat, EPSG:4326, EPSG:3857). A unit test verifies the equal-CRS short-circuit actually fires for the lng/lat case.

…ffset reader in RS_Value Add read_point_xy to sedona-geometry: a Point-only WKB reader that pulls the leading (x, y) from the fixed coordinate offset instead of running the general header parser. RS_Value now uses it on both the scalar fast path and the general batch path, preserving non-Point -> error and POINT EMPTY -> NULL semantics. The WKB parse was the dominant per-point cost after band/affine/CRS hoisting; the scalar (one-raster, many-points) benchmark improves ~56%, the array cases ~12%.

james-willis · 2026-06-23T23:15:05Z

I spent a few hours doing benchmarking/profiling. I found that branch elimination did basically nothing. The big gains were in the Scalar raster case hoist, type level crs hoist, and the use of a point-specific wkb parser.

Detailed write-up (generated by Claude, the AI pair on this session)

What worked

CRS-decision hoist — per point we were calling crs_equals, which allocates a String via to_authority_code (~15 ns/point, uniform across CRS types: OGC:CRS84, EPSG:4326, EPSG:3857 all ~14.7–14.9 ns). Resolving the reproject decision once per batch when the point CRS is column-level eliminates it.

Band/affine/buffer hoist — the scalar fast path resolves band buffer, nodata, affine transform, and raster CRS once for the whole batch instead of per point.

Point-specific WKB reader — the parse was the dominant remaining cost. A three-way comparison settled it:

parse approach	ns/point
`WkbHeader::try_new` + `first_xy` (general)	19.1
`wkb` crate `read_wkb` + coord	7.5
hand-rolled point-only `read_point_xy`	0.9

read_point_xy reads the coordinate from a fixed offset because a Point WKB has a trivial fixed layout.

Benchmark result (criterion)

case	change
ScalarArray (one raster, many points)	−56%
ArrayArray (per-row raster)	−11.7%
ArrayArrayScalar	−12.2%

What did NOT help

Branch elimination — profiling showed the branches guard operations that are already tiny: inv_transform ~1.5–2.0 ns/point, read_pixel ~3.2–3.4 ns/point. No headroom around 2–3 ns ops when the path is dominated by a ~19 ns parse.

Takeaway

The two levers that mattered were both about doing work once instead of per-point (CRS decision, band/affine resolution) and not doing general-purpose work for a specific case (point-only parse). The micro-loop mechanics were never the bottleneck. Follow-up DB-97 tracks spreading read_point_xy to other Point-only consumers.

…, fix edge cases Address review feedback on the RS_Value fast path: - Extract resolve_point_xy / xy_to_pixel shared by the scalar fast path and the general batch path, so the point->pixel logic is identical by construction instead of by hand. - Defer band/2-D validation in the scalar path until at least one point needs sampling, so an all-null/all-empty batch returns NULL rather than erroring on an unsupported raster — matching the general path. - Treat a non-finite point ordinate (NaN/Inf) as no location to sample (NULL), rather than letting a NaN index saturate to pixel column 0. - Reuse SRID_FLAG_BIT in read_point_xy instead of a magic literal. - Move geo-traits back to dev-dependencies (only used under #[cfg(test)]).

…to the band argument A constant (scalar) raster now takes the hoisted fast path regardless of the band argument, not just the 2-arg default-band case: - no band argument or a constant (scalar) band -> the band buffer is hoisted once, same as band 1; - a band column -> the band buffer is resolved per row (its NdBuffer borrows from the band and can't be cached across distinct bands), but the affine/CRS/reproject work is still hoisted out of the per-point loop. A NULL scalar band yields an all-NULL result. Selecting a different band no longer drops RS_Value(scalar_raster, point_col, band) onto the per-row path.

…ad of a WGS84 pivot Unlike the spatial predicates, RS_Value propagates a failed point->raster-CRS transform as an error rather than falling back to a WGS84 pivot: sampling must land the point in the raster's own CRS (the only space its affine/pixel grid is defined in), so there is no neutral CRS to fall back to.

paleolimbot

Thank you for the optimization goose chase here...50% better is great and this is also useful for non-raster PR reviews. WKB parsing and CRS equality are likely places we can improve performance elsewhere, too.

The wkb_header tests repeated `use sedona_testing::fixtures::*;` in eight functions; lift it to the `mod tests` block once.

Build an in-memory raster from a random numpy array (via Raster.from_numpy) with a known geotransform and no CRS, then sample pixel centers, off-center positions, and random interior points; assert RS_Value matches what rasterio reads at the same world coordinates. Adds rasterio to the test extras; the test skips if it is not installed.

The fixtures-import hoist left a trailing comment glued to the function's opening brace; split it onto its own line.

paleolimbot

Thank you!

If the R failure persists I'll fix in a separate PR.

github-actions Bot requested a review from prantogg June 17, 2026 21:13

james-willis added 3 commits June 18, 2026 14:44

chore(rust/sedona-raster-functions): benchmark RS_Value

798b67c

Add RS_Value cases (grid, point, and point+band) to the consolidated native-raster-functions Criterion bench, alongside the other RS_* kernels.

james-willis force-pushed the jw/rs-value branch from 47d859b to 82b7968 Compare June 18, 2026 21:46

james-willis marked this pull request as ready for review June 22, 2026 22:28

james-willis changed the title ~~feat(rust/sedona-raster-functions): add RS_Value~~ feat(rust/sedona-raster-functions): add RS_Value for spatial 2d rasters Jun 22, 2026

james-willis commented Jun 22, 2026

View reviewed changes

Comment thread rust/sedona-raster-functions/src/rs_value.rs

paleolimbot reviewed Jun 23, 2026

View reviewed changes

james-willis added 4 commits June 23, 2026 11:01

docs(rs_value): use the 3-arg ST_Point(x, y, crs) form in the point e…

a61ec7d

…xample

james-willis added 4 commits June 23, 2026 13:50

james-willis added 3 commits June 23, 2026 16:58

james-willis requested a review from paleolimbot June 24, 2026 00:29

paleolimbot approved these changes Jun 24, 2026

View reviewed changes

Comment thread rust/sedona-geometry/src/wkb_header.rs Outdated

Comment thread python/sedonadb/tests/functions/test_raster_functions.py

james-willis added 4 commits June 24, 2026 11:04

test(rust/sedona-geometry): hoist fixtures import to the test module

67af522

The wkb_header tests repeated `use sedona_testing::fixtures::*;` in eight functions; lift it to the `mod tests` block once.

Merge remote-tracking branch 'origin/main' into pr-974

fc95617

style(rust/sedona-geometry): fix rustfmt in wkb_header test

7effb08

The fixtures-import hoist left a trailing comment glued to the function's opening brace; split it onto its own line.

paleolimbot approved these changes Jun 24, 2026

View reviewed changes

paleolimbot merged commit daf5fd4 into apache:main Jun 24, 2026
16 of 17 checks passed

Uh oh!

Conversation

james-willis commented Jun 17, 2026

Uh oh!

Uh oh!

paleolimbot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

james-willis Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

james-willis Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

james-willis Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

zhangfengcdt Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

james-willis commented Jun 23, 2026

Uh oh!

paleolimbot commented Jun 23, 2026

Uh oh!

james-willis commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What worked

Benchmark result (criterion)

What did NOT help

Takeaway

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paleolimbot left a comment •

edited

Loading

james-willis commented Jun 23, 2026 •

edited

Loading