diff --git a/docs/blog/posts/intro-sedonadb-0-4-geog-toronto.png b/docs/blog/posts/intro-sedonadb-0-4-geog-toronto.png new file mode 100644 index 00000000000..251b9895b04 Binary files /dev/null and b/docs/blog/posts/intro-sedonadb-0-4-geog-toronto.png differ diff --git a/docs/blog/posts/intro-sedonadb-0-4-tiff-chunks.png b/docs/blog/posts/intro-sedonadb-0-4-tiff-chunks.png new file mode 100644 index 00000000000..971ab5c45f9 Binary files /dev/null and b/docs/blog/posts/intro-sedonadb-0-4-tiff-chunks.png differ diff --git a/docs/blog/posts/intro-sedonadb-0-4-zarr-chunks.png b/docs/blog/posts/intro-sedonadb-0-4-zarr-chunks.png new file mode 100644 index 00000000000..51daa8bcbcb Binary files /dev/null and b/docs/blog/posts/intro-sedonadb-0-4-zarr-chunks.png differ diff --git a/docs/blog/posts/intro-sedonadb-0-4.md b/docs/blog/posts/intro-sedonadb-0-4.md new file mode 100644 index 00000000000..cef9c11447e --- /dev/null +++ b/docs/blog/posts/intro-sedonadb-0-4.md @@ -0,0 +1,510 @@ +--- +date: + created: 2026-06-19 +links: + - SedonaDB: https://sedona.apache.org/sedonadb/ +authors: + - dewey + - kristin + - feng + - jia + - pranav +title: "SedonaDB 0.4.0 Release" +--- + +# SedonaDB 0.4.0 Release + +The Apache Sedona community is excited to announce the release of [SedonaDB](https://sedona.apache.org/sedonadb) version 0.4.0! + +SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen. It is developed as a subproject of Apache Sedona. This release consists of 187 resolved issues including XX new functions from 15 contributors. + +Apache Sedona powers large-scale geospatial processing on distributed engines like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). SedonaDB extends the Sedona ecosystem with a single-node engine optimized for small-to-medium data analytics, delivering the simplicity and speed that distributed systems often cannot. + +## Release Highlights + +We're excited to have so many things to highlight in this release! + +- Packaging for conda-forge +- Python DataFrame API +- R dplyr interface +- Geography support +- GPU-accelerated spatial join +- Parquet improvements +- Improved spatial function coverage and documentation +- Raster infrastructure + + +```python +# pip install --upgrade "apache-sedona[db]" +import sedona.db + +sd = sedona.db.connect() +sd.options.interactive = True +``` + +## Packaging for conda-forge + +We're excited to announce that sedonadb is now available on conda-forge! Users of the conda ecosystem can now install SedonaDB with: + +```shell +conda install -c conda-forge sedonadb +``` + +Thank you to [p-vdp](https://github.com/p-vdp) for driving this work! + +## Python DataFrame API + +While SQL is a powerful, flexible, and well-understood language for describing many of the things one might want to do with spatial data, many Python users prefer using Python functions to interact with data frames and expressions. SedonaDB 0.4.0 adds just this: a basic set of transformation on data frames and expressions drawing inspiration from [Ibis](https://ibis-project.org), [DuckDB Python's relational API](https://duckdb.org/docs/current/clients/python/relational_api), [PySpark](https://spark.apache.org/docs/latest/api/python/index.html), [DataFusion Python](https://datafusion.apache.org/python/), [Pandas](https://pandas.pydata.org), and [GeoPandas](https://geopandas.org). + + +```python +# Load cities and countries from geoarrow-data +cities_url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.parquet" +countries_url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries.parquet" + +cities = sd.read(cities_url).alias("cities") +countries = sd.read(countries_url).alias("countries") + +# Spatial join using the DataFrame API +f = sd.funcs +result = ( + cities.join( + countries, + on=f.st_intersects(cities.geometry, countries.geometry), + ) + .filter(countries.continent != "North America") + .select(cities.name, country=countries.name, continent=countries.continent) + .sort("country") + .limit(10) +) +result.show() +``` + + ┌──────────────┬─────────────┬───────────────┐ + │ name ┆ country ┆ continent │ + │ utf8 ┆ utf8 ┆ utf8 │ + ╞══════════════╪═════════════╪═══════════════╡ + │ Kabul ┆ Afghanistan ┆ Asia │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Tirana ┆ Albania ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Algiers ┆ Algeria ┆ Africa │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Luanda ┆ Angola ┆ Africa │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Buenos Aires ┆ Argentina ┆ South America │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Yerevan ┆ Armenia ┆ Asia │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Melbourne ┆ Australia ┆ Oceania │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Canberra ┆ Australia ┆ Oceania │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Sydney ┆ Australia ┆ Oceania │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Vienna ┆ Austria ┆ Europe │ + └──────────────┴─────────────┴───────────────┘ + + +Support for `.group_by()`, `.agg()`, `.distinct()`, and `.distinct_on()` were also added in 0.4.0 and more are in the works! + +In addition to data frame operators, we increasingly realized that our hard-won library of 170+ spatial functions was difficult to explore and use (despite improved [SQL reference documentation](https://sedona.apache.org/sedonadb/latest/reference/sql/)!). Following the pattern of [Pandas-style datatype-specific accessors](https://pandas.pydata.org/docs/reference/series.html#accessors), you can now write expressions as chains with inline documentation helping you as you go. + + +```python +countries.select( + countries.name, geometry=countries.geometry.geo.centroid().geo.buffer(0.1) +).limit(4) +``` + + + + + ┌─────────────────────────────┬────────────────────────────────────────────────────────────────────┐ + │ name ┆ geometry │ + │ utf8 ┆ geometry │ + ╞═════════════════════════════╪════════════════════════════════════════════════════════════════════╡ + │ Fiji ┆ MULTIPOLYGON(((163.7531646445823 -17.31630942638265,163.755086116… │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ United Republic of Tanzania ┆ MULTIPOLYGON(((34.652989854755944 -6.25773242850609,34.6549113267… │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Western Sahara ┆ MULTIPOLYGON(((-12.237831111607791 24.291172960208634,-12.2359096… │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Canada ┆ MULTIPOLYGON(((-98.24238137209699 61.46907614534894,-98.240459900… │ + └─────────────────────────────┴────────────────────────────────────────────────────────────────────┘ + + + +## R dplyr Interface + +Similarly, in past releases R users had to use SQL to access most features of SedonaDB. In the 0.5.0 release, you can now use the dplyr backend to transform your SedonaDB-backed lazy data frames. To make this happen we added a new pakckage, **sdplyr**, with an additional package **sedonafns** whose job it is to enumerate and document our large and growing collection of spatial functions. You can get everything you need from [the sdplyr package on R Universe](https://apache.r-universe.dev/sdplyr) to get started! + +```r +library(sdplyr) + +# Load cities and countries from geoarrow-data +cities_url <- "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.parquet" +countries_url <- "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries.parquet" + +cities <- sd_read_parquet(cities_url) +countries <- sd_read_parquet(countries_url) + +# Spatial join using dplyr +cities |> + inner_join( + countries, + by = sd_join_intersects() + ) |> + filter(continent != "North America") |> + select( + city = name.x, + country = name.y, + continent + ) |> + arrange(country) |> + head(10) +#> +#> ┌──────────────┬─────────────┬───────────────┐ +#> │ city ┆ country ┆ continent │ +#> │ utf8 ┆ utf8 ┆ utf8 │ +#> ╞══════════════╪═════════════╪═══════════════╡ +#> │ Kabul ┆ Afghanistan ┆ Asia │ +#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ +#> │ Tirana ┆ Albania ┆ Europe │ +#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ +#> │ Algiers ┆ Algeria ┆ Africa │ +#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ +#> │ Luanda ┆ Angola ┆ Africa │ +#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ +#> │ Buenos Aires ┆ Argentina ┆ South America │ +#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ +#> │ Yerevan ┆ Armenia ┆ Asia │ +#> └──────────────┴─────────────┴───────────────┘ +#> Preview of up to 6 row(s) +``` + +While we have some R functions translated for use in SedonaDB à la dbplyr/arrow, this is a work in progress. In the meantime, DataFusion/SedonaDB-raw SQL functions are available via `.fns` (e.g., `.fns$substr(some_col,1, 5)`) and tidy `!!some_r_expression` are supported and we would love [feature requests](https://github.com/apache/sedona-db/issues/new) to implement frequently used functions from our users. + +## Geography Support + +SedonaDB 0.4.0 introduces expanded support for the Geography data type, including a completely rewritten implementation of most operations using [s2geography](https://github.com/paleolimbot/s2geography), which in turn packages primitives from Google's [s2geometry](https://github.com/google/s2geometry) as PostGIS/BigQuery-compatible SQL operators. + +Geography shines for distance queries across large geographical areas. For example, if we wanted to find cities within 200 km of Germany, we'd have to find a local projection and do potentially expensive transformations between coordinate systems. Geography simplifies this to a simple distance-within query: + + +```python +germany = countries.filter(countries.name == "Germany").select( + countries.geometry.geo.to_geography() +) + +cities.filter( + cities.geometry.geo.to_geography().geo.d_within(germany, 100_000.0) +).select(cities.name) +``` + + + + + ┌────────────┐ + │ name │ + │ utf8 │ + ╞════════════╡ + │ Vaduz │ + ├╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Luxembourg │ + ├╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Bern │ + ├╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Prague │ + ├╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Amsterdam │ + ├╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ Berlin │ + └────────────┘ + + + +This works for spatial joins, too. If you'd like to analyze *all* the countries with their nearby cities, SedonaDB can now do that too. + + +```python +cities_geog = cities.select( + cities.name, geometry=cities.geometry.geo.to_geography() +).alias("cities_geog") +countries_geog = countries.select( + countries.name, + countries.continent, + geometry=countries.geometry.geo.to_geography(), +).alias("countries_geog") + +cities_geog.join( + countries_geog, + on=f.st_dwithin( + cities_geog.geometry, + countries_geog.geometry, + 100_000, # Distance in meters! + ), +).select( + cities_geog.name, country=countries_geog.name, continent=countries_geog.continent +) +``` + + + + + ┌──────────────┬──────────────┬───────────┐ + │ name ┆ country ┆ continent │ + │ utf8 ┆ utf8 ┆ utf8 │ + ╞══════════════╪══════════════╪═══════════╡ + │ Vatican City ┆ Italy ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ San Marino ┆ Italy ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Vaduz ┆ Austria ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Vaduz ┆ Germany ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Vaduz ┆ Switzerland ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Vaduz ┆ Italy ┆ Europe │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Lobamba ┆ South Africa ┆ Africa │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Lobamba ┆ Mozambique ┆ Africa │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Lobamba ┆ eSwatini ┆ Africa │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤ + │ Luxembourg ┆ France ┆ Europe │ + └──────────────┴──────────────┴───────────┘ + + + +Geography is also useful for calculating the shortest path along the surface of the earth between two points or more complex geometries. For example, if you wanted to find the theoretical path an airplane would take if it flew from Toronto to any other city in the world, you could simply create a line (using ST_MakeLine) and use ST_TessellateGeom to visualize the line on a flat lon/lat map like those provided by most interactive map providers. + + +```python +import lonboard + +result = ( + cities_geog.filter(cities_geog.name == "Toronto") + .select(name_from=cities_geog.name, pt_from=cities_geog.geometry) + .cross_join(cities_geog) + .select( + name_to=sd.col("name"), + geometry=sd.col("pt_from") + .geo.make_line(sd.col("geometry")) + .geo.tessellate_geom(1_000), + ) +) + +lonboard.viz(result.to_pandas()) +``` + + + + + Map(basemap_style= + required group field_id=-1 arrow_schema { + optional binary field_id=-1 name (String); + optional binary field_id=-1 continent (String); + optional binary field_id=-1 geometry; + } + + + +## Improved spatial function coverage + +Since the 0.3.0 release we have been fortunate to work with contributors to add 26 new ST_ and RS_ functions to our growing catalogue. Users of rs_bandnodatavalue, rs_bandpixeltype, rs_bandtodim, rs_contains, rs_dimnames, rs_dimsize, rs_dimtoband, rs_ensureloaded, rs_frompath, rs_intersects, rs_isempty, rs_metadata, rs_numdimensions, rs_pixelascentroid, rs_pixelaspoint, rs_pixelaspolygon, rs_shape, rs_slice, rs_slicerange, rs_within, st_linesubstring, st_longestline, st_normalize, st_pointonsurface, st_reduceprecision, st_relate, st_segmentize, st_tessellategeog, st_tessellategeom, st_togeography, and st_togeometry. This brings the total ST_ and RS_ function count to 176, all of which are tested against PostGIS and/or BigQuery for compatibility with the full matrix of geometry types and XY, XYZ, XYM, and XYZM wherever possible. Check them out in our always improving [SQL reference documentation](https://sedona.apache.org/sedonadb/latest/reference/sql/)! + +Thank you to [Kontinuation](https://github.com/Kontinuation), [james-willis](https://github.com/james-willis), [oglego](https://github.com/oglego), and [sapienza88](https://github.com/sapienza88) for these contributions! + +## N-dimensional array and Zarr support + +Geospatial raster data is increasingly a *datacube*: climate reanalyses, satellite time series, and model outputs all stack extra axes (e.g., `time`, `level`, `band`) on top of the spatial grid. In 0.4.0, SedonaDB's raster type goes natively N-dimensional, and the new `sedonadb-zarr` extension reads [Zarr]() groups straight into a queryable raster column. + +Point SedonaDB at a Zarr datacube and explore its shape without reading a single pixel: + + +```python +# pip install sedonadb-zarr +import sedonadb_zarr + +# Register Zarr functionality with a SedonaDB session +sd.register(sedonadb_zarr.ZarrExtension()) + +# A public ERA5 reanalysis cube around Hurricane Florence (Zarr v3, anonymous). +# This is a metadata read only: no pixels fetched, even against a large remote cube. +url = "https://atlantis-vis-o.s3-ext.jc.rl.ac.uk/hurricanes/era5/florence" + +# The group URL has no `.zarr` suffix, so name the format explicitly. +cube = sd.read( + url, + format="zarr", + options={"arrays": ["velocity", "u_component_of_wind", "v_component_of_wind"]}, +) + +cube.select( + f.rs_dimnames(cube.raster).alias("dims"), + f.rs_shape(cube.raster).alias("shape"), +).show(5) +``` + + ┌────────────────────────────────────┬──────────────────┐ + │ dims ┆ shape │ + │ list ┆ list │ + ╞════════════════════════════════════╪══════════════════╡ + │ [time, level, latitude, longitude] ┆ [1, 1, 121, 221] │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [time, level, latitude, longitude] ┆ [1, 1, 121, 221] │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [time, level, latitude, longitude] ┆ [1, 1, 121, 221] │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [time, level, latitude, longitude] ┆ [1, 1, 121, 221] │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [time, level, latitude, longitude] ┆ [1, 1, 121, 221] │ + └────────────────────────────────────┴──────────────────┘ + + +`sedonadb-zarr` emits one row per Zarr chunk, so the storage layout *is* the data layout. SedonaDB stays lazy about pixels: reading the group and inspecting its dimensions (`RS_DimNames`, `RS_Shape`, `RS_DimSize`, `RS_NumDimensions`) touches only the group schema, which is a small metadata round-trip, with no pixel bytes fetched. A chunk's bytes are read only when an operation actually needs them. `RS_Slice` is currently one such operation: it resolves the chunks the query touches, then slices. + + +```python +cube.select( + f.rs_slice(cube.raster, "time", 0).alias("step") +).show(5) +``` + + ┌───────────────────────────────────────────────────────────┐ + │ step │ + │ raster │ + ╞═══════════════════════════════════════════════════════════╡ + │ [221x121/3] @ [264.875 14.875 320.125 45.125] / EPSG:4326 │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [221x121/3] @ [264.875 14.875 320.125 45.125] / EPSG:4326 │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [221x121/3] @ [264.875 14.875 320.125 45.125] / EPSG:4326 │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [221x121/3] @ [264.875 14.875 320.125 45.125] / EPSG:4326 │ + ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤ + │ [221x121/3] @ [264.875 14.875 320.125 45.125] / EPSG:4326 │ + └───────────────────────────────────────────────────────────┘ + + +Because each row is a chunk, `.show(5)` touches exactly five chunks and a `.filter(...)` trims the set further, so you only fetch the chunks your query reaches. The data stays in the cloud until then. And because each chunk is a row with a real spatial footprint, you can put a Zarr on a map without decoding a single pixel. `RS_Envelope` turns each chunk into its bounding geometry, so you can see exactly where the cube's chunks fall: + + +```python +chunks = cube.select( + f.rs_envelope(cube.raster) + .geo.transform(4326) + # Trick to normalize coordinates, which lonboard otherwise rejects + # as outside the lon/lat range + .geo.tessellate_geog(10_000) + .geo.tessellate_geom(10_000) + .alias("geom") +) + +lonboard.viz(chunks) +``` + +![Rendered Zarr chunks](intro-sedonadb-0-4-zarr-chunks.png) + +For the full walkthrough (load a cube, inspect its dimensions, slice a plane, and hand it to NumPy) see [Working with Zarr and NDArray data in SedonaDB](). + +TODO: Add the link when docs for 0.4.0 are published + +In addition to a [redesign of our existing raster type to accomodate N-dimensional data](https://github.com/apache/sedona-db/pull/749), we added a number of `RS_` functions that mirror equivalents in Sedona Spark, with a focus on reading and extracting metadata. For example, in SedonaDB 0.4.0 it is also possible to extract the extent from a series of GeoTiffs and put them on a map. + + +```python +import pandas as pd + +usgs_drg_quads = [ + "https://download.osgeo.org/geotiff/samples/usgs/o41078a1.tif", + "https://download.osgeo.org/geotiff/samples/usgs/o41078a2.tif", + "https://download.osgeo.org/geotiff/samples/usgs/o41078a3.tif", +] + +tiffs = sd.create_data_frame(pd.DataFrame({"url": usgs_drg_quads})) +envelopes = tiffs.select(f.rs_envelope(f.rs_frompath(tiffs.url)).geo.transform(4326)) +lonboard.viz(envelopes) +``` + +![Sample GeoTiff locations](intro-sedonadb-0-4-tiff-chunks.png) + +While we're not ready to announce that SedonaDB fully supports raster data, we're excited at the foundation we we able to build for the 0.4.0 release and look forward to building this feature in earnest with the community over the next few months. + +Thank you to [Kontinuation](https://github.com/Kontinuation) for designing and driving the 2D improvements, and [james-willis](https://github.com/james-willis) for designing and driving N-dimensional/Zarr support! + +## What's Next? + +Among other projets, in 0.5.0 we plan to [expand geography coverage to include the union of supported geography functions in PostGIS and BigQuery](https://github.com/apache/sedona-db/issues/821), [make our point cloud readers accessible to Python and R users](https://github.com/apache/sedona-db/issues/595), and realize the potential of our raster/ND-array foundation. + +We'd love to hear your feedback and use cases! Join the conversation on [GitHub Discussions](https://github.com/apache/sedona-db/discussions) or the [Apache Sedona mailing list](https://sedona.apache.org/community/). + +## Contributors + +```shell +git shortlog -sn apache-sedona-db-0.4.0.dev..HEAD + 50 Dewey Dunnington + 31 James Willis + 20 Jia Yu + 16 Kristin Cowalcijk + 13 Liang Geng + 5 Camden Lowrance + 2 Balthasar Teuscher + 2 Mehak3010 + 2 Terry L. Blessing + 2 Yongting You + 1 Mayank Aggarwal + 1 Peter Von der Porten + 1 Pranav Toggi + 1 Pratheek Rebala + 1 Selim S. + 1 oglego +```