Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up) by singaraiona · Pull Request #205 · RayforceDB/rayforce

singaraiona · 2026-05-17T18:56:26Z

Summary

Multi-phase migration of the null encoding from bitmap-only to dual-encoded (sentinel value in payload + nullmap bit). Foundation for the eventual Phase 7 bitmap-arm reclamation.

Phase 2 (F64): null F64 slots hold NaN alongside the nullmap bit. Covers CSV / atom / UPDATE cast + broadcast / mixed literals / group keys / pivot / linkop / set_all_null / cast_vec_copy_nulls. Grouped-aggregation consumer (da_accum_row) gained NaN-skip guards in SUM/AVG/STDDEV/VAR/MIN/MAX/FIRST/LAST F64 arms.
Phase 3a (integer / temporal): generalizes the same contract to I16 / I32 / I64 / DATE / TIME / TIMESTAMP. Same producer surface + grouped-agg consumer with per-agg has_nulls mask + sentinel-compare guards (cache-line efficient). Plus a cleanup sweep through `temporal.c` / `strop.c` / `window.c` / `median` / `group` builtin / `mark_i64_overflow_as_null` (which was actively destroying the sentinel).
Phase 3 follow-up: per-`(group, agg)` non-null counts (`nn_count`) fix grouped AVG / VAR / STDDEV divisor; result-side null finalization for MIN / MAX / PROD / FIRST / LAST on all-null groups; FIRST / LAST no longer advance `first_row[gid]` on null rows.

Hazards documented in `include/rayforce.h`

Sentinel-compare for integer null-skip drops user-stored `INT_MIN` values in HAS_NULLS columns. Bounded by dual encoding (bitmap is source of truth); becomes lossy at Phase 7.
Multi-`FIRST`/`LAST` aggs on the same group with disjoint null patterns share `first_row[gid]` — loser agg gets typed null instead of its actual first-non-null value. Closed by per-`(group, agg)` `first_row` arrays at the next pass.

Stats

~27 commits, 16+ files touched in `src/`, ~25 new C unit tests + 3 RFL regression files (`f64_dual_encoding.rfl`, `integer_dual_encoding.rfl`, `grouped_agg_null_correctness.rfl`)
Full suite: 2449 of 2450 passed (1 unrelated pre-existing skip, 0 failed)

Test Plan

`make test` — 2449/2450 green
Run `./rayforce.test -f null` to spot-check null-specific tests
Run the three new RFL regressions individually
Visual diff of `include/rayforce.h` NULL_* comment block matches the implementation reality
Sanity-check that grouped queries (`select {a: (sum v), b: (avg v)} from t by g`) on existing benchmark data produce identical answers when no nulls are present

First slice of the bitmap-null → sentinel-null migration. Foundation only; no kernel behavior change for nullable numeric / temporal columns yet. - include/rayforce.h: expose NULL_I16/I32/I64/F64 sentinel constants and document the multi-phase dual-encoding bridge. Temporal types reuse NULL_I32 / NULL_I64 based on storage width. - src/io/csv.c: lock BOOL and U8 down as non-nullable. Empty / unparseable cells in those columns materialize as false / 0, never set the nullmap bit, and never set col_had_null — so the post-parse strip drops HAS_NULLS from these columns. Both the parallel and serial parse paths plus the truncated-row fill branch are updated. - test/test_csv.c: update test_csv_null_bool and test_csv_explicit_u8_schema_serial to assert the v4 contract (empty BOOL/U8 cells are false/0 with HAS_NULLS stripped). - test/rfl/null/bool_u8_lockdown.rfl: new regression test exercising the CSV path under the new contract. Deferred from the plan's Phase 1: ray_typed_null storage-slot sentinel writes were attempted but reverted because they break any fused fast path that reads raw payload without consulting the nullmap (caught by rfl/integration/fused_group_parity). Integer/temporal atom sentinels land in Phase 3a alongside the kernel flip; F64 NaN lands in Phase 2.

ray_typed_null(-RAY_F64) now stores NaN in the f64 payload while still setting nullmap[0]&1. This is the foundational atom-level dual encoding for Phase 2; downstream kernels can now compare the slot directly (x != x) instead of consulting the nullmap. Adds atom/typed_null_f64 and atom/typed_null_i64 unit tests.

Both parsing paths now write NULL_F64 (NaN) into the F64 payload whenever they set the nullmap bit, matching the Phase 2 dual-encoding contract. - src/io/csv.c:935 (parallel truncated-row fill) - src/io/csv.c:1019 (parallel F64 case) - src/io/csv.c:1132 (serial truncated-row fill) - src/io/csv.c:1210 (serial F64 case) The parallel/serial dispatch in csv_parse_main uses n_rows > 8192, so the existing 3-row test_csv_null_f64 only exercises the serial path; both paths must change for the contract to hold consistently across small and large inputs. Plan's "Suggested commit order" explicitly sanctions folding Tasks 2 and 3 into a single commit when this coupling makes a split unnatural. test_csv_null_f64 now asserts the slot holds NaN in addition to the nullmap bit.

When `update` promotes an I64 expression into an F64 column, the existing code appended the (double)i64 value then set the nullmap bit — leaving the F64 payload as a stale cast of the I64 null sentinel. Phase 2 also writes NULL_F64 to the slot so fused readers (and the new dual-encoding regression in Task 6) see NaN. Preventive fix: no observable behavior change today (all current readers consult the nullmap), but locks in the Phase 2 invariant for Phase 7 when the bitmap goes away.

End-to-end coverage that null F64 slots hold NaN AND set the nullmap bit across CSV ingest, ray_typed_null, UPDATE cast, sort, distinct, aggregations, and group-by. Includes a fused_group_parity guard that the nullable-column gate stays in place.

Integration reviewer surfaced 7 sites where F64 vectors were set null in the bitmap but their double payload was left at 0.0 (or stale garbage). Phase 2 enforced that null F64 slots must carry NULL_F64 (NaN) in the payload so consumers reading raw doubles see NaN without consulting the bitmap. Sites patched: group.c:2570 VAR/STDDEV insufficient-count -> write NULL_F64 to v group.c:6260 Parallel variance finalization -> NaN-fill alongside the bitmap bit group.c:6383 F64 group-by key null path -> NaN-fill on continue when key type is RAY_F64 (left non-F64 keys alone -- out of scope) pivot.c:524 F64 pivot index key null path -> NaN-fill on continue when key type is RAY_F64 builtins.c:cast_vec_copy_nulls -> after bitmap copy, walk it for F64 destinations and overwrite each null slot with NULL_F64. Chose this over patching cast_range_worker because the helper is the single chokepoint shared by every cast path (fast vec->vec, STR->SYM, atom slow path) parse.c:642 Mixed int/float vector literal -> a non-F64 typed null (0Nl/0Ni/0Nh) inside an F64-promoted literal wrote 0.0 via the (double)elems[i]->i64 cast. Patch the null-propagation loop to also NaN-fill the slot. Null F64 atoms already carry NULL_F64 from Phase 2a expr.c:set_all_null -> after the bitmap-fill paths, for RAY_F64 results memset the payload to NULL_F64. Catches the scalar-null-broadcast case in element-wise binary ops Regression tests added to test/test_compile.c (which already has the runtime setup wiring for ray_eval_str): compile/f64_mixed_literal_null_slot_is_nan -- [1.0 0N 3.0] slot 1 compile/f64_cast_i64_null_slot_is_nan -- (as 'F64 [1 0N 3]) slot 1 Both assert d[1] != d[1] (NaN by IEEE-754) directly on the raw payload.

The grouped aggregation path (da_accum_row + scalar_accum_row) read raw F64 payloads without consulting the nullmap. Pre-Phase 2 (and pre-2e), null F64 slots happened to hold 0.0 - the additive identity - so SUM returned the right answer by coincidence. Phase 2e correctly writes NaN to null F64 slots, which surfaces the latent bug. This commit makes da_accum_row's F64 SUM / AVG / STDDEV / VAR / MIN / MAX / FIRST / LAST paths skip NaN payloads via `if (v == v)` guards, honoring the Phase 2 dual-encoding contract. scalar_accum_row receives the mirrored fix since its dispatch is structurally identical. Also patches two sibling VAR/STDDEV insufficient-count producer sites (group.c:2339, group.c:6462) that Phase 2e missed - they follow the same v = 0.0; <set_null> pattern as Site 1 (line 2570) and now write NULL_F64 for slot consistency. Known follow-up (out of scope): grouped AVG / VAR / STDDEV use count[gid] as the divisor, which still includes null rows. Fixing this requires per-(group, agg) non-null counts - separate Phase 3 work.

Final mechanical pass on F64 null producer sites. Patches: - src/ops/group.c:7115-7135 — decomposed VAR/STDDEV finalization (4 sites) - src/ops/group.c:2277-2298 — parallel radix_phase3_fn F64 key null scatter - src/ops/query.c (3 sites) — UPDATE I64-typed-null broadcast into F64 column - src/ops/linkop.c — gather/deref NaN-fills F64 null result slots All follow the established Phase 2 pattern: alongside the existing nullmap bit, also write NULL_F64 to the data payload so consumers reading raw slots honor the dual-encoding contract. Updates the NULL_* comment in include/rayforce.h to flag known Phase 3 work that Phase 2 deliberately does NOT address: - Grouped AVG/VAR/STDDEV use count[gid] as divisor (includes nulls) - Grouped FIRST/LAST/MIN/MAX/PROD on all-null F64 groups leak sentinel values (DBL_MAX, -DBL_MAX, 0.0, or NaN-poisoned product) Both classes require per-(group, agg) non-null counting and result-side null finalization — work that aligns naturally with the Phase 7 bitmap removal where every consumer must be updated anyway.

Dual-encode null F64 values as NaN payload + nullmap bit so consumers that read raw payload honor the null contract without consulting the bitmap. 8-commit slice covering CSV ingest, ray_typed_null, UPDATE cast + broadcast, mixed-numeric literals, group-by/pivot keys, linkop deref, and the grouped-aggregation NaN-skip consumer fix. Known Phase 3 semantic gaps documented in include/rayforce.h. # Conflicts: # test/test_atom.c

ray_typed_null(-RAY_I16/I32/I64/DATE/TIME/TIMESTAMP) now stores the correct-width INT_MIN sentinel in the matching union field alongside nullmap[0]&1. Mirrors Phase 2a's F64 NaN treatment. Adds atom/typed_null_{i16,i32,date,time,timestamp} unit tests and updates atom/typed_null_i64 to assert NULL_I64 (was 0). Group-by aggregation regressions surfaced because the SUM/AVG hot paths in ops/group.c (da_accum_row, scalar_accum_row, scalar_sum_* specialised loops, sp_eligible radix HT macros) read agg slots raw, so the new NULL_I* sentinels in null rows poisoned the sums. Mirror the existing F64 NaN-skip pattern: * da_ctx_t / scalar_ctx_t carry a per-agg int-null mask plus a per-agg sentinel value precomputed from the column type. * Hot SUM/AVG/STDDEV/VAR/PROD/FIRST/LAST/MIN/MAX arms gate the integer accumulator with the sentinel check (zero overhead when the source vec has no nulls — the mask bit is clear and short- circuit eats the comparison). * The radix HT / dense range fast paths bail to a slower masked path when any agg column advertises nulls. * The scalar dispatch fast path (scalar_sum_i64_fn / scalar_sum_f64_fn) similarly skips the tight loop when nulls are present. * try_linear_sumavg_input_i64 refuses to build a linear plan when any term column has nulls, so scalar_sum_linear_i64_fn never sees sentinels.

All I16/I32/I64/DATE/TIME/TIMESTAMP columns now write the correct- width sentinel into the payload alongside the nullmap bit, in both parallel and serial parse paths, in both the truncated-row fill and the per-row parse case. Mirrors Phase 2bc's F64 NaN treatment. Adds csv/null_{i16,i32,date,time,timestamp} unit tests asserting the slot value (in addition to the nullmap bit).

Generalizes the Phase 2e F64 post-cast NaN-fill to all integer and temporal destination types (I16/I32/I64/DATE/TIME/TIMESTAMP). Casts that produce a nullable result now write the matching INT_MIN sentinel into each null payload slot. Critically handles Hazard 3 (width-cast truncation): (int16_t)NULL_I32 = 0 is bypassed — the destination-width sentinel is written post-cast, not propagated through the cast macro. Adds compile/{i32_cast_i64,i16_cast_i32,i64_cast_i32}_null_slot_is_sentinel regression tests asserting the slot value after each cast direction.

Generalizes the Phase 2e F64 NaN-fill in set_all_null to all integer and temporal result types so binary ops with a scalar-null operand write the matching NULL_I64 / NULL_I32 / NULL_I16 sentinel into every payload slot alongside the existing nullmap bits. Also closes a parallel hole in store_typed_elem (lang/internal.h), which is the actual hot path for the tree-walker's slow per-element binary loop. It was memset(0)ing the payload at null positions, violating the dual-encoding contract for I-typed and F64 outputs. The new test (+ 0Nl [1 2 3]) routes through this path because the DAG/exec_elementwise_binary executor bails out whenever a scalar operand is a null atom (see eval.c:709 can_dag = 0). Adds compile/i64_scalar_null_propagation_slot_is_sentinel regression exercising (+ 0Nl [1 2 3]).

The numeric-promotion overlay in UPDATE-WHERE (query.c:8228) casts I64<->F64 and I32<->F64 values then propagates nullmap bits via ray_vec_set_null. After Phase 3a-2/3/4 the source-side sentinel is no longer 0 - it's NaN for F64 and INT_MIN for integers - and casting it produces implementation-defined garbage in the slot. Extend the null-propagation loop to write the destination's correct-width sentinel (NULL_F64 / NULL_I64 / NULL_I32 / NULL_I16) alongside ray_vec_set_null, matching the dual-encoding contract. Adds compile/update_promo_{f64_to_i64,i64_to_f64}_null_slot regressions.

Extend Phase 2g's 3 RAY_ATOM_IS_NULL broadcast sites in query.c to also fill the destination payload with the correct-width INT_MIN sentinel for I16/I32/I64/DATE/TIME/TIMESTAMP columns. Mirrors the canonical 4-arm switch established by Tasks 3, 4, 5. Adds compile/update_atom_broadcast_i64_null_slot regression(s).

Extend Phase 2g's F64-key fills in group.c serial finalization and the parallel radix_phase3_fn key scatter to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots. Adds compile/group_by_i64_null_key_slot regression.

Extend Phase 2g's F64-key fill in pivot.c to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots. Adds compile/pivot_i64_null_key_slot regression.

Extend Phase 2g's F64 NaN-fill in linkop.c gather/deref to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/ TIMESTAMP result slots.

Task 10's audit found grpt phase-3 emit at group.c:9176 wrote 0 to null key slots instead of the canonical INT_MIN sentinel per type. Apply the same 4-arm switch used by the post-radix key scatter so TOP_N/BOT_N result columns also satisfy the Phase 3a dual-encoding contract.

End-to-end coverage that null I16/I32/I64 slots hold the matching INT_MIN sentinel AND set the nullmap bit, across CSV ingest, atom, UPDATE cast/broadcast, sort, distinct, aggregations, group-by SUM (consumer fix), and cast width preservation.

…ULL_* block

Integration review surfaced ~16 producer sites missed by both Phase 2 and Phase 3a's enumerated surface. All write 0/0.0/uninitialized to the payload while calling ray_vec_set_null — bitmap-only nulls that violate the dual-encoding contract. Sites patched: * src/ops/temporal.c (6 sites): extract/trunc family I64/TIMESTAMP output fills NULL_I64 * src/ops/string.c, src/ops/strop.c (5 sites): strlen fills NULL_I64 * src/ops/expr.c: mark_i64_overflow_as_null no longer overwrites the INT64_MIN sentinel with 0 (it IS the sentinel post-Phase 3a-1) * src/ops/group.c: median_per_group empty-group fills NULL_F64 (Phase 2 gap) * src/ops/window.c: zero-memset of result vectors replaced with sentinel-fill so every slot is pre-stamped null * src/ops/builtins.c: ray_group key-vec null bucket + mixed-numeric list builtin both fill width-correct sentinel Adds regression tests for the most observable cases.

Extend the Phase 2 F64 dual-encoding contract (null payload + nullmap bit) to integer storage types: I16, I32, I64 and the temporal aliases DATE, TIME, TIMESTAMP. 13 commits covering ray_typed_null, CSV parser, cast_vec_copy_nulls, set_all_null / store_typed_elem, UPDATE atom broadcast + WHERE numeric promo, group-by keys (serial + parallel + TOP_N), pivot, linkop, plus a cleanup sweep through temporal / strlen / window / median / group builtin / overflow handler. Grouped-aggregation consumer (da_accum_row + scalar_accum_row) gained per-agg has_nulls mask + width-correct sentinel-compare guards in SUM/AVG/STDDEV/VAR/PROD/MIN/MAX/FIRST/LAST integer arms. Hazard: a user-stored INT_MIN in a HAS_NULLS column is silently dropped — acceptable under dual encoding since the bitmap is source of truth; sentinel-free representation needed at Phase 7 cutover. Phase 3 follow-up (out of scope): grouped AVG / VAR / STDDEV use count[gid] as divisor without excluding null rows; FIRST / LAST / MIN / MAX / PROD on all-null groups leak sentinel values. Both require per-(group, agg) non-null counts and result-side null finalization.

…sentinel work

Closes the documented Phase 3 follow-up gaps in include/rayforce.h's NULL_* block. Grouped aggregations in src/ops/group.c now correctly handle the dual-encoding nulls that Phase 2 (F64 NaN) and Phase 3a (integer sentinels) plumbed through the producer surface. The fix adds an `nn_count[gid * n_aggs + a]` array to the DA / scalar accumulator, allocated only when at least one agg input column has HAS_NULLS set (so null-free queries pay zero memory overhead). The counter is incremented inside the existing F64 NaN-skip and integer sentinel-skip branches, then drives finalization: - AVG / VAR / STDDEV use nn_count[idx] as the divisor (and the insufficient-count gate) instead of count[gid], so null rows no longer dilute the mean. - MIN / MAX / PROD / FIRST / LAST emit a typed null (NULL_F64 + nullmap bit for F64 outputs, NULL_I64 for integer outputs) when nn_count[idx] == 0, instead of leaking DBL_MAX / -DBL_MAX / 0 / the product seed for an all-null group. - FIRST / LAST no longer advance acc->first_row[gid] when the row's agg value is null — a null prefix would have blocked every later non-null row from winning the FIRST race. PROD's first-row write is similarly guarded so the seed-vs-multiply branch keys off "first non-null seen" rather than count==1. Multi-FIRST/LAST limitation (per-(group, agg) first_row would fix it) is documented inline; out of scope for this phase. Tests: rfl/null/grouped_agg_null_correctness covers AVG divisor, all-null MIN/MAX/PROD/FIRST/LAST, F64 + integer variants, null-prefix FIRST/LAST, scalar (no by:) path, and STDDEV/VAR insufficient-count. Full suite: 2449/2450 passing (1 skipped, was 2448/2449). Updates the NULL_* documentation paragraph to mark the gaps closed.

…zation Closes the deferred Phase 3 gaps documented in include/rayforce.h after Phase 3a: - AVG / VAR / STDDEV divisor was count[gid] including null rows. Now uses per-(group, agg) nn_count, incremented only when the agg row is non-null per the existing F64 NaN-skip / integer sentinel-skip guards. - MIN / MAX / PROD / FIRST / LAST on all-null groups now emit typed null at finalization instead of leaking accumulator sentinels (DBL_MAX / -DBL_MAX / sum=0 / product seed). - FIRST / LAST no longer advance first_row[gid] / last_row[gid] when the row's value is null — preserves first/last *non-null* semantics. Adds test/rfl/null/grouped_agg_null_correctness.rfl regression. Doc paragraph in include/rayforce.h updated to mark these gaps closed. Known remaining limitation (out of scope, documented inline): multi-FIRST/LAST aggs share first_row[gid] across aggs on the same group; would require per-(group, agg) first_row arrays to close.

hetoku added 27 commits May 15, 2026 17:00

docs: mark Phase 2 (F64 NaN sentinel) complete in NULL_* comment

ec868e7

Phase 3a-8: pivot null integer key fills sentinels

bf856a6

Extend Phase 2g's F64-key fill in pivot.c to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots. Adds compile/pivot_i64_null_key_slot regression.

Phase 3a-9: linkop deref fills integer null sentinels

5ce5a67

Extend Phase 2g's F64 NaN-fill in linkop.c gather/deref to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/ TIMESTAMP result slots.

docs: mark Phase 3a (integer/temporal NULL_* sentinels) complete in N…

7524d5f

…ULL_* block

Merge origin/master (perf: row-form group operators) into Phase 2+3a …

1769c25

…sentinel work

singaraiona merged commit 3a6386c into master May 17, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up)#205

Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up)#205
singaraiona merged 27 commits into
masterfrom
sentinel-migration

singaraiona commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singaraiona commented May 17, 2026

Summary

Hazards documented in `include/rayforce.h`

Stats

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants