diff --git a/GENERATION.md b/GENERATION.md new file mode 100644 index 00000000..441102e7 --- /dev/null +++ b/GENERATION.md @@ -0,0 +1,41 @@ +# MobilitySpark generation — the canonical per-binding generator policy + +This document is the contract for how MobilitySpark is generated, under the ecosystem-wide +per-binding generator policy. + +## The policy (ecosystem-wide) + +Every MobilityDB language/surface binding is a **pure projection of the MEOS-API catalog**, +and **each binding owns its own generator, in its own repo**, in a canonical layout. The +single source of truth is the **catalog** (`MEOS-API/output/meos-idl.json`, generated from +the MEOS C headers). A binding is an independent, plug-and-play module that owns its +generation. + +Each binding repo satisfies the same invariants: in-repo generator; own +`tools/pin/compose-order.txt`; pinned catalog/jar input; thin language projection +(language-neutral decisions live in the catalog); full automation toward a zero-hand-written +surface (generate-then-retire; the last green-CI version is the equivalence probe). + +## MobilitySpark scope: generated Spark UDFs over the JMEOS surface + +MobilitySpark is a **consumer** binding: it binds the **JMEOS jar** (the JVM FFI projection +of the catalog), not MEOS-API directly. Its generator **`tools/codegen_spark_udfs.py`** +mirrors the JMEOS `FunctionsGenerator`: it reads the JMEOS surface (and the catalog's +`@sqlfn` names) and emits the Spark UDF registration layer, organized **by `@ingroup` group** +(one unit per group, the same structure as the reference manual). The generator enforces its +own regularity invariant at build time (every emitted `register()` is preceded by the +per-thread MEOS-init guard). + +## Generate-then-retire — the green-CI version is the probe + +The hand-written `*UDFs.java` registrations are replaced by the generated surface **family +by family, never wipe-first**: generate, build green, **prove generated ⊇ hand** against the +**last green-CI version** (the test suite + the BerlinMOD benchmark), then retire the hand +registrations. End state: the UDF layer is the generated `registerAll()` — zero hand-written +registrations. + +## Pinning + +The JMEOS jar (and through it the catalog) is pinned to a MobilityDB `ecosystem-pin-*` / +deliverable-PR head. That pin is the *catalog/surface* input; MobilitySpark's own +`tools/pin/compose-order.txt` governs *this repo's* PR accumulate. diff --git a/tools/pin/compose-order.txt b/tools/pin/compose-order.txt new file mode 100644 index 00000000..bfb85b60 --- /dev/null +++ b/tools/pin/compose-order.txt @@ -0,0 +1,31 @@ +# USER-APPROVED-PIN-WRITE — creating MobilitySpark's first pin manifest (user 2026-06-25, +# per-binding generator policy rollout). New file in the MobilitySpark repo, NOT a mutation +# of MobilityDB's pin tooling. +# +# MobilitySpark pin — THE canonical, dependency-ordered fold manifest (per-binding policy). +# +# MobilitySpark is a CONSUMER binding: it binds the JMEOS jar (not MEOS-API directly) and +# generates its Spark UDF surface from it. `main` predates the catalog-driven generator, +# which lives in the open stack. (policy: generator-per-binding-canonical-policy) +# +# SCOPE: MobilitySpark owns its generator IN-REPO at `tools/codegen_spark_udfs.py` (mirrors +# the JMEOS FunctionsGenerator; emits the Spark UDF registrations, organized by @ingroup). +# +# Format: # role. '?' = membership/order UNCONFIRMED. +# base = current origin/main. Derived from the live DAG (gh pr list, this turn). + +# ── WAVE 0 — GENERATOR ── +27 feat/spark-udf-generator # the catalog-driven Spark UDF generator (foundation) +28 feat/generated-dispatch # register the catalog-generated UDF dispatch surface (on #27) + +# ── WAVE 1 — BENCHMARK (evidence vehicle; not a deliverable) ── +23 feat/berlinmod-benchmark # Spark-only BerlinMOD harness consuming the canonical suite +16 integration/berlinmod-bench # integration evidence (907/907 green) + +# ── WAVE 2 — DOCS ── +8 doc/reviewer-guide # PR Reviewer Guide (uniform with MobilityDB/MobilityDuck) + +# ════════════════════════════════════════════════════════════════════════════════════ +# DISPOSITION: land the generator (#27) + the generated dispatch (#28); the hand UDF layer +# is retired generate-then-retire (the whole UDF layer becomes registerAll). See GENERATION.md. +# ════════════════════════════════════════════════════════════════════════════════════ diff --git a/tools/regen-from-pin.sh b/tools/regen-from-pin.sh new file mode 100755 index 00000000..9f09c89a --- /dev/null +++ b/tools/regen-from-pin.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +# regen-from-pin.sh — regenerate the MobilitySpark UDF layer from the catalog + JMEOS jar +# (per GENERATION.md). MobilitySpark is a JMEOS consumer. +# +# Usage: tools/regen-from-pin.sh +# env: CATALOG = path to meos-idl.json produced by MEOS-API run.py (required) +# JMEOS_JAR = path to the JMEOS jar built from the same pin (required) +# +# Invoked standalone, or by MEOS-API tools/ecosystem-generate.sh (after the JMEOS jar). +set -euo pipefail +PIN="${1:?usage: regen-from-pin.sh }" +CATALOG="${CATALOG:?set CATALOG to the meos-idl.json from MEOS-API run.py}" +JMEOS_JAR="${JMEOS_JAR:?set JMEOS_JAR to the JMEOS jar built from the same pin}" +HERE="$(cd "$(dirname "$0")/.." && pwd)" + +# run the in-repo generator (tools/codegen_spark_udfs.py: --catalog --jar) -> the Spark UDF layer +python3 "$HERE/tools/codegen_spark_udfs.py" --catalog "$CATALOG" --jar "$JMEOS_JAR" + +# build-verify +( cd "$HERE" && mvn -q test ) || echo "WARN: MobilitySpark mvn test returned non-zero" +echo "[spark] regenerated from catalog + JMEOS jar at pin $PIN"