Skip to content

Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up)#205

Merged
singaraiona merged 27 commits into
masterfrom
sentinel-migration
May 17, 2026
Merged

Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up)#205
singaraiona merged 27 commits into
masterfrom
sentinel-migration

Conversation

@singaraiona
Copy link
Copy Markdown
Collaborator

Summary

Multi-phase migration of the null encoding from bitmap-only to dual-encoded (sentinel value in payload + nullmap bit). Foundation for the eventual Phase 7 bitmap-arm reclamation.

  • Phase 2 (F64): null F64 slots hold NaN alongside the nullmap bit. Covers CSV / atom / UPDATE cast + broadcast / mixed literals / group keys / pivot / linkop / set_all_null / cast_vec_copy_nulls. Grouped-aggregation consumer (da_accum_row) gained NaN-skip guards in SUM/AVG/STDDEV/VAR/MIN/MAX/FIRST/LAST F64 arms.
  • Phase 3a (integer / temporal): generalizes the same contract to I16 / I32 / I64 / DATE / TIME / TIMESTAMP. Same producer surface + grouped-agg consumer with per-agg has_nulls mask + sentinel-compare guards (cache-line efficient). Plus a cleanup sweep through `temporal.c` / `strop.c` / `window.c` / `median` / `group` builtin / `mark_i64_overflow_as_null` (which was actively destroying the sentinel).
  • Phase 3 follow-up: per-`(group, agg)` non-null counts (`nn_count`) fix grouped AVG / VAR / STDDEV divisor; result-side null finalization for MIN / MAX / PROD / FIRST / LAST on all-null groups; FIRST / LAST no longer advance `first_row[gid]` on null rows.

Hazards documented in `include/rayforce.h`

  • Sentinel-compare for integer null-skip drops user-stored `INT_MIN` values in HAS_NULLS columns. Bounded by dual encoding (bitmap is source of truth); becomes lossy at Phase 7.
  • Multi-`FIRST`/`LAST` aggs on the same group with disjoint null patterns share `first_row[gid]` — loser agg gets typed null instead of its actual first-non-null value. Closed by per-`(group, agg)` `first_row` arrays at the next pass.

Stats

  • ~27 commits, 16+ files touched in `src/`, ~25 new C unit tests + 3 RFL regression files (`f64_dual_encoding.rfl`, `integer_dual_encoding.rfl`, `grouped_agg_null_correctness.rfl`)
  • Full suite: 2449 of 2450 passed (1 unrelated pre-existing skip, 0 failed)

Test Plan

  • `make test` — 2449/2450 green
  • Run `./rayforce.test -f null` to spot-check null-specific tests
  • Run the three new RFL regressions individually
  • Visual diff of `include/rayforce.h` NULL_* comment block matches the implementation reality
  • Sanity-check that grouped queries (`select {a: (sum v), b: (avg v)} from t by g`) on existing benchmark data produce identical answers when no nulls are present

hetoku added 27 commits May 15, 2026 17:00
First slice of the bitmap-null → sentinel-null migration.  Foundation only;
no kernel behavior change for nullable numeric / temporal columns yet.

- include/rayforce.h: expose NULL_I16/I32/I64/F64 sentinel constants and
  document the multi-phase dual-encoding bridge.  Temporal types reuse
  NULL_I32 / NULL_I64 based on storage width.

- src/io/csv.c: lock BOOL and U8 down as non-nullable.  Empty / unparseable
  cells in those columns materialize as false / 0, never set the nullmap
  bit, and never set col_had_null — so the post-parse strip drops
  HAS_NULLS from these columns.  Both the parallel and serial parse paths
  plus the truncated-row fill branch are updated.

- test/test_csv.c: update test_csv_null_bool and
  test_csv_explicit_u8_schema_serial to assert the v4 contract (empty
  BOOL/U8 cells are false/0 with HAS_NULLS stripped).

- test/rfl/null/bool_u8_lockdown.rfl: new regression test exercising
  the CSV path under the new contract.

Deferred from the plan's Phase 1: ray_typed_null storage-slot sentinel
writes were attempted but reverted because they break any fused fast
path that reads raw payload without consulting the nullmap (caught by
rfl/integration/fused_group_parity).  Integer/temporal atom sentinels
land in Phase 3a alongside the kernel flip; F64 NaN lands in Phase 2.
ray_typed_null(-RAY_F64) now stores NaN in the f64 payload while still
setting nullmap[0]&1. This is the foundational atom-level dual encoding
for Phase 2; downstream kernels can now compare the slot directly
(x != x) instead of consulting the nullmap.

Adds atom/typed_null_f64 and atom/typed_null_i64 unit tests.
Both parsing paths now write NULL_F64 (NaN) into the F64 payload
whenever they set the nullmap bit, matching the Phase 2 dual-encoding
contract.

- src/io/csv.c:935   (parallel truncated-row fill)
- src/io/csv.c:1019  (parallel F64 case)
- src/io/csv.c:1132  (serial truncated-row fill)
- src/io/csv.c:1210  (serial F64 case)

The parallel/serial dispatch in csv_parse_main uses n_rows > 8192,
so the existing 3-row test_csv_null_f64 only exercises the serial
path; both paths must change for the contract to hold consistently
across small and large inputs. Plan's "Suggested commit order"
explicitly sanctions folding Tasks 2 and 3 into a single commit
when this coupling makes a split unnatural.

test_csv_null_f64 now asserts the slot holds NaN in addition to
the nullmap bit.
When `update` promotes an I64 expression into an F64 column, the
existing code appended the (double)i64 value then set the nullmap
bit — leaving the F64 payload as a stale cast of the I64 null
sentinel. Phase 2 also writes NULL_F64 to the slot so fused readers
(and the new dual-encoding regression in Task 6) see NaN.

Preventive fix: no observable behavior change today (all current
readers consult the nullmap), but locks in the Phase 2 invariant
for Phase 7 when the bitmap goes away.
End-to-end coverage that null F64 slots hold NaN AND set the nullmap
bit across CSV ingest, ray_typed_null, UPDATE cast, sort, distinct,
aggregations, and group-by.  Includes a fused_group_parity guard
that the nullable-column gate stays in place.
Integration reviewer surfaced 7 sites where F64 vectors were set null
in the bitmap but their double payload was left at 0.0 (or stale
garbage).  Phase 2 enforced that null F64 slots must carry NULL_F64
(NaN) in the payload so consumers reading raw doubles see NaN without
consulting the bitmap.

Sites patched:

  group.c:2570  VAR/STDDEV insufficient-count -> write NULL_F64 to v
  group.c:6260  Parallel variance finalization -> NaN-fill alongside
                the bitmap bit
  group.c:6383  F64 group-by key null path -> NaN-fill on continue when
                key type is RAY_F64 (left non-F64 keys alone -- out of
                scope)
  pivot.c:524   F64 pivot index key null path -> NaN-fill on continue
                when key type is RAY_F64
  builtins.c:cast_vec_copy_nulls -> after bitmap copy, walk it for F64
                destinations and overwrite each null slot with NULL_F64.
                Chose this over patching cast_range_worker because the
                helper is the single chokepoint shared by every cast
                path (fast vec->vec, STR->SYM, atom slow path)
  parse.c:642   Mixed int/float vector literal -> a non-F64 typed null
                (0Nl/0Ni/0Nh) inside an F64-promoted literal wrote 0.0
                via the (double)elems[i]->i64 cast.  Patch the
                null-propagation loop to also NaN-fill the slot.  Null
                F64 atoms already carry NULL_F64 from Phase 2a
  expr.c:set_all_null -> after the bitmap-fill paths, for RAY_F64
                results memset the payload to NULL_F64.  Catches the
                scalar-null-broadcast case in element-wise binary ops

Regression tests added to test/test_compile.c (which already has the
runtime setup wiring for ray_eval_str):
  compile/f64_mixed_literal_null_slot_is_nan -- [1.0 0N 3.0] slot 1
  compile/f64_cast_i64_null_slot_is_nan      -- (as 'F64 [1 0N 3]) slot 1
Both assert d[1] != d[1] (NaN by IEEE-754) directly on the raw payload.
The grouped aggregation path (da_accum_row + scalar_accum_row) read raw F64
payloads without consulting the nullmap. Pre-Phase 2 (and pre-2e), null F64
slots happened to hold 0.0 - the additive identity - so SUM returned the
right answer by coincidence. Phase 2e correctly writes NaN to null F64 slots,
which surfaces the latent bug.

This commit makes da_accum_row's F64 SUM / AVG / STDDEV / VAR / MIN / MAX /
FIRST / LAST paths skip NaN payloads via `if (v == v)` guards, honoring the
Phase 2 dual-encoding contract. scalar_accum_row receives the mirrored fix
since its dispatch is structurally identical.

Also patches two sibling VAR/STDDEV insufficient-count producer sites
(group.c:2339, group.c:6462) that Phase 2e missed - they follow the same
v = 0.0; <set_null> pattern as Site 1 (line 2570) and now write NULL_F64
for slot consistency.

Known follow-up (out of scope): grouped AVG / VAR / STDDEV use count[gid]
as the divisor, which still includes null rows. Fixing this requires
per-(group, agg) non-null counts - separate Phase 3 work.
Final mechanical pass on F64 null producer sites. Patches:

- src/ops/group.c:7115-7135 — decomposed VAR/STDDEV finalization (4 sites)
- src/ops/group.c:2277-2298 — parallel radix_phase3_fn F64 key null scatter
- src/ops/query.c (3 sites) — UPDATE I64-typed-null broadcast into F64 column
- src/ops/linkop.c — gather/deref NaN-fills F64 null result slots

All follow the established Phase 2 pattern: alongside the existing nullmap
bit, also write NULL_F64 to the data payload so consumers reading raw slots
honor the dual-encoding contract.

Updates the NULL_* comment in include/rayforce.h to flag known Phase 3
work that Phase 2 deliberately does NOT address:

- Grouped AVG/VAR/STDDEV use count[gid] as divisor (includes nulls)
- Grouped FIRST/LAST/MIN/MAX/PROD on all-null F64 groups leak sentinel
  values (DBL_MAX, -DBL_MAX, 0.0, or NaN-poisoned product)

Both classes require per-(group, agg) non-null counting and result-side
null finalization — work that aligns naturally with the Phase 7 bitmap
removal where every consumer must be updated anyway.
Dual-encode null F64 values as NaN payload + nullmap bit so consumers
that read raw payload honor the null contract without consulting the
bitmap.  8-commit slice covering CSV ingest, ray_typed_null, UPDATE
cast + broadcast, mixed-numeric literals, group-by/pivot keys, linkop
deref, and the grouped-aggregation NaN-skip consumer fix.  Known
Phase 3 semantic gaps documented in include/rayforce.h.

# Conflicts:
#	test/test_atom.c
ray_typed_null(-RAY_I16/I32/I64/DATE/TIME/TIMESTAMP) now stores the
correct-width INT_MIN sentinel in the matching union field alongside
nullmap[0]&1. Mirrors Phase 2a's F64 NaN treatment.

Adds atom/typed_null_{i16,i32,date,time,timestamp} unit tests and
updates atom/typed_null_i64 to assert NULL_I64 (was 0).

Group-by aggregation regressions surfaced because the SUM/AVG hot
paths in ops/group.c (da_accum_row, scalar_accum_row, scalar_sum_*
specialised loops, sp_eligible radix HT macros) read agg slots raw,
so the new NULL_I* sentinels in null rows poisoned the sums. Mirror
the existing F64 NaN-skip pattern:

  * da_ctx_t / scalar_ctx_t carry a per-agg int-null mask plus a
    per-agg sentinel value precomputed from the column type.
  * Hot SUM/AVG/STDDEV/VAR/PROD/FIRST/LAST/MIN/MAX arms gate the
    integer accumulator with the sentinel check (zero overhead when
    the source vec has no nulls — the mask bit is clear and short-
    circuit eats the comparison).
  * The radix HT / dense range fast paths bail to a slower masked
    path when any agg column advertises nulls.
  * The scalar dispatch fast path (scalar_sum_i64_fn /
    scalar_sum_f64_fn) similarly skips the tight loop when nulls
    are present.
  * try_linear_sumavg_input_i64 refuses to build a linear plan when
    any term column has nulls, so scalar_sum_linear_i64_fn never
    sees sentinels.
All I16/I32/I64/DATE/TIME/TIMESTAMP columns now write the correct-
width sentinel into the payload alongside the nullmap bit, in both
parallel and serial parse paths, in both the truncated-row fill and
the per-row parse case.

Mirrors Phase 2bc's F64 NaN treatment.

Adds csv/null_{i16,i32,date,time,timestamp} unit tests asserting
the slot value (in addition to the nullmap bit).
Generalizes the Phase 2e F64 post-cast NaN-fill to all integer and
temporal destination types (I16/I32/I64/DATE/TIME/TIMESTAMP). Casts
that produce a nullable result now write the matching INT_MIN
sentinel into each null payload slot.

Critically handles Hazard 3 (width-cast truncation):
(int16_t)NULL_I32 = 0 is bypassed — the destination-width sentinel
is written post-cast, not propagated through the cast macro.

Adds compile/{i32_cast_i64,i16_cast_i32,i64_cast_i32}_null_slot_is_sentinel
regression tests asserting the slot value after each cast direction.
Generalizes the Phase 2e F64 NaN-fill in set_all_null to all integer
and temporal result types so binary ops with a scalar-null operand
write the matching NULL_I64 / NULL_I32 / NULL_I16 sentinel into every
payload slot alongside the existing nullmap bits.

Also closes a parallel hole in store_typed_elem (lang/internal.h),
which is the actual hot path for the tree-walker's slow per-element
binary loop.  It was memset(0)ing the payload at null positions,
violating the dual-encoding contract for I-typed and F64 outputs.
The new test (+ 0Nl [1 2 3]) routes through this path because the
DAG/exec_elementwise_binary executor bails out whenever a scalar
operand is a null atom (see eval.c:709 can_dag = 0).

Adds compile/i64_scalar_null_propagation_slot_is_sentinel regression
exercising (+ 0Nl [1 2 3]).
The numeric-promotion overlay in UPDATE-WHERE (query.c:8228) casts
I64<->F64 and I32<->F64 values then propagates nullmap bits via
ray_vec_set_null. After Phase 3a-2/3/4 the source-side sentinel is
no longer 0 - it's NaN for F64 and INT_MIN for integers - and
casting it produces implementation-defined garbage in the slot.

Extend the null-propagation loop to write the destination's
correct-width sentinel (NULL_F64 / NULL_I64 / NULL_I32 / NULL_I16)
alongside ray_vec_set_null, matching the dual-encoding contract.

Adds compile/update_promo_{f64_to_i64,i64_to_f64}_null_slot regressions.
Extend Phase 2g's 3 RAY_ATOM_IS_NULL broadcast sites in query.c to
also fill the destination payload with the correct-width INT_MIN
sentinel for I16/I32/I64/DATE/TIME/TIMESTAMP columns. Mirrors the
canonical 4-arm switch established by Tasks 3, 4, 5.

Adds compile/update_atom_broadcast_i64_null_slot regression(s).
Extend Phase 2g's F64-key fills in group.c serial finalization and
the parallel radix_phase3_fn key scatter to write the correct-width
INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots.

Adds compile/group_by_i64_null_key_slot regression.
Extend Phase 2g's F64-key fill in pivot.c to write the correct-width
INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots.

Adds compile/pivot_i64_null_key_slot regression.
Extend Phase 2g's F64 NaN-fill in linkop.c gather/deref to write
the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/
TIMESTAMP result slots.
Task 10's audit found grpt phase-3 emit at group.c:9176 wrote 0 to
null key slots instead of the canonical INT_MIN sentinel per type.
Apply the same 4-arm switch used by the post-radix key scatter so
TOP_N/BOT_N result columns also satisfy the Phase 3a dual-encoding
contract.
End-to-end coverage that null I16/I32/I64 slots hold the matching
INT_MIN sentinel AND set the nullmap bit, across CSV ingest, atom,
UPDATE cast/broadcast, sort, distinct, aggregations, group-by SUM
(consumer fix), and cast width preservation.
Integration review surfaced ~16 producer sites missed by both Phase 2
and Phase 3a's enumerated surface. All write 0/0.0/uninitialized to
the payload while calling ray_vec_set_null — bitmap-only nulls that
violate the dual-encoding contract.

Sites patched:
  * src/ops/temporal.c (6 sites): extract/trunc family I64/TIMESTAMP
    output fills NULL_I64
  * src/ops/string.c, src/ops/strop.c (5 sites): strlen fills NULL_I64
  * src/ops/expr.c: mark_i64_overflow_as_null no longer overwrites the
    INT64_MIN sentinel with 0 (it IS the sentinel post-Phase 3a-1)
  * src/ops/group.c: median_per_group empty-group fills NULL_F64
    (Phase 2 gap)
  * src/ops/window.c: zero-memset of result vectors replaced with
    sentinel-fill so every slot is pre-stamped null
  * src/ops/builtins.c: ray_group key-vec null bucket + mixed-numeric
    list builtin both fill width-correct sentinel

Adds regression tests for the most observable cases.
Extend the Phase 2 F64 dual-encoding contract (null payload + nullmap
bit) to integer storage types: I16, I32, I64 and the temporal aliases
DATE, TIME, TIMESTAMP.

13 commits covering ray_typed_null, CSV parser, cast_vec_copy_nulls,
set_all_null / store_typed_elem, UPDATE atom broadcast + WHERE
numeric promo, group-by keys (serial + parallel + TOP_N), pivot,
linkop, plus a cleanup sweep through temporal / strlen / window /
median / group builtin / overflow handler.

Grouped-aggregation consumer (da_accum_row + scalar_accum_row) gained
per-agg has_nulls mask + width-correct sentinel-compare guards in
SUM/AVG/STDDEV/VAR/PROD/MIN/MAX/FIRST/LAST integer arms.  Hazard:
a user-stored INT_MIN in a HAS_NULLS column is silently dropped —
acceptable under dual encoding since the bitmap is source of truth;
sentinel-free representation needed at Phase 7 cutover.

Phase 3 follow-up (out of scope): grouped AVG / VAR / STDDEV use
count[gid] as divisor without excluding null rows; FIRST / LAST /
MIN / MAX / PROD on all-null groups leak sentinel values.  Both
require per-(group, agg) non-null counts and result-side null
finalization.
Closes the documented Phase 3 follow-up gaps in include/rayforce.h's
NULL_* block.  Grouped aggregations in src/ops/group.c now correctly
handle the dual-encoding nulls that Phase 2 (F64 NaN) and Phase 3a
(integer sentinels) plumbed through the producer surface.

The fix adds an `nn_count[gid * n_aggs + a]` array to the DA / scalar
accumulator, allocated only when at least one agg input column has
HAS_NULLS set (so null-free queries pay zero memory overhead).  The
counter is incremented inside the existing F64 NaN-skip and integer
sentinel-skip branches, then drives finalization:

- AVG / VAR / STDDEV use nn_count[idx] as the divisor (and the
  insufficient-count gate) instead of count[gid], so null rows no
  longer dilute the mean.
- MIN / MAX / PROD / FIRST / LAST emit a typed null (NULL_F64 +
  nullmap bit for F64 outputs, NULL_I64 for integer outputs) when
  nn_count[idx] == 0, instead of leaking DBL_MAX / -DBL_MAX / 0 /
  the product seed for an all-null group.
- FIRST / LAST no longer advance acc->first_row[gid] when the row's
  agg value is null — a null prefix would have blocked every later
  non-null row from winning the FIRST race.  PROD's first-row write
  is similarly guarded so the seed-vs-multiply branch keys off
  "first non-null seen" rather than count==1.

Multi-FIRST/LAST limitation (per-(group, agg) first_row would fix it)
is documented inline; out of scope for this phase.

Tests: rfl/null/grouped_agg_null_correctness covers AVG divisor,
all-null MIN/MAX/PROD/FIRST/LAST, F64 + integer variants, null-prefix
FIRST/LAST, scalar (no by:) path, and STDDEV/VAR insufficient-count.
Full suite: 2449/2450 passing (1 skipped, was 2448/2449).

Updates the NULL_* documentation paragraph to mark the gaps closed.
…zation

Closes the deferred Phase 3 gaps documented in include/rayforce.h after
Phase 3a:

- AVG / VAR / STDDEV divisor was count[gid] including null rows.  Now
  uses per-(group, agg) nn_count, incremented only when the agg row is
  non-null per the existing F64 NaN-skip / integer sentinel-skip guards.
- MIN / MAX / PROD / FIRST / LAST on all-null groups now emit typed
  null at finalization instead of leaking accumulator sentinels
  (DBL_MAX / -DBL_MAX / sum=0 / product seed).
- FIRST / LAST no longer advance first_row[gid] / last_row[gid] when
  the row's value is null — preserves first/last *non-null* semantics.

Adds test/rfl/null/grouped_agg_null_correctness.rfl regression.
Doc paragraph in include/rayforce.h updated to mark these gaps closed.

Known remaining limitation (out of scope, documented inline):
multi-FIRST/LAST aggs share first_row[gid] across aggs on the same
group; would require per-(group, agg) first_row arrays to close.
@singaraiona singaraiona merged commit 3a6386c into master May 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants