Sentinel-null migration (Phase 2 + Phase 3a + Phase 3 follow-up)#205
Merged
Conversation
First slice of the bitmap-null → sentinel-null migration. Foundation only; no kernel behavior change for nullable numeric / temporal columns yet. - include/rayforce.h: expose NULL_I16/I32/I64/F64 sentinel constants and document the multi-phase dual-encoding bridge. Temporal types reuse NULL_I32 / NULL_I64 based on storage width. - src/io/csv.c: lock BOOL and U8 down as non-nullable. Empty / unparseable cells in those columns materialize as false / 0, never set the nullmap bit, and never set col_had_null — so the post-parse strip drops HAS_NULLS from these columns. Both the parallel and serial parse paths plus the truncated-row fill branch are updated. - test/test_csv.c: update test_csv_null_bool and test_csv_explicit_u8_schema_serial to assert the v4 contract (empty BOOL/U8 cells are false/0 with HAS_NULLS stripped). - test/rfl/null/bool_u8_lockdown.rfl: new regression test exercising the CSV path under the new contract. Deferred from the plan's Phase 1: ray_typed_null storage-slot sentinel writes were attempted but reverted because they break any fused fast path that reads raw payload without consulting the nullmap (caught by rfl/integration/fused_group_parity). Integer/temporal atom sentinels land in Phase 3a alongside the kernel flip; F64 NaN lands in Phase 2.
ray_typed_null(-RAY_F64) now stores NaN in the f64 payload while still setting nullmap[0]&1. This is the foundational atom-level dual encoding for Phase 2; downstream kernels can now compare the slot directly (x != x) instead of consulting the nullmap. Adds atom/typed_null_f64 and atom/typed_null_i64 unit tests.
Both parsing paths now write NULL_F64 (NaN) into the F64 payload whenever they set the nullmap bit, matching the Phase 2 dual-encoding contract. - src/io/csv.c:935 (parallel truncated-row fill) - src/io/csv.c:1019 (parallel F64 case) - src/io/csv.c:1132 (serial truncated-row fill) - src/io/csv.c:1210 (serial F64 case) The parallel/serial dispatch in csv_parse_main uses n_rows > 8192, so the existing 3-row test_csv_null_f64 only exercises the serial path; both paths must change for the contract to hold consistently across small and large inputs. Plan's "Suggested commit order" explicitly sanctions folding Tasks 2 and 3 into a single commit when this coupling makes a split unnatural. test_csv_null_f64 now asserts the slot holds NaN in addition to the nullmap bit.
When `update` promotes an I64 expression into an F64 column, the existing code appended the (double)i64 value then set the nullmap bit — leaving the F64 payload as a stale cast of the I64 null sentinel. Phase 2 also writes NULL_F64 to the slot so fused readers (and the new dual-encoding regression in Task 6) see NaN. Preventive fix: no observable behavior change today (all current readers consult the nullmap), but locks in the Phase 2 invariant for Phase 7 when the bitmap goes away.
End-to-end coverage that null F64 slots hold NaN AND set the nullmap bit across CSV ingest, ray_typed_null, UPDATE cast, sort, distinct, aggregations, and group-by. Includes a fused_group_parity guard that the nullable-column gate stays in place.
Integration reviewer surfaced 7 sites where F64 vectors were set null
in the bitmap but their double payload was left at 0.0 (or stale
garbage). Phase 2 enforced that null F64 slots must carry NULL_F64
(NaN) in the payload so consumers reading raw doubles see NaN without
consulting the bitmap.
Sites patched:
group.c:2570 VAR/STDDEV insufficient-count -> write NULL_F64 to v
group.c:6260 Parallel variance finalization -> NaN-fill alongside
the bitmap bit
group.c:6383 F64 group-by key null path -> NaN-fill on continue when
key type is RAY_F64 (left non-F64 keys alone -- out of
scope)
pivot.c:524 F64 pivot index key null path -> NaN-fill on continue
when key type is RAY_F64
builtins.c:cast_vec_copy_nulls -> after bitmap copy, walk it for F64
destinations and overwrite each null slot with NULL_F64.
Chose this over patching cast_range_worker because the
helper is the single chokepoint shared by every cast
path (fast vec->vec, STR->SYM, atom slow path)
parse.c:642 Mixed int/float vector literal -> a non-F64 typed null
(0Nl/0Ni/0Nh) inside an F64-promoted literal wrote 0.0
via the (double)elems[i]->i64 cast. Patch the
null-propagation loop to also NaN-fill the slot. Null
F64 atoms already carry NULL_F64 from Phase 2a
expr.c:set_all_null -> after the bitmap-fill paths, for RAY_F64
results memset the payload to NULL_F64. Catches the
scalar-null-broadcast case in element-wise binary ops
Regression tests added to test/test_compile.c (which already has the
runtime setup wiring for ray_eval_str):
compile/f64_mixed_literal_null_slot_is_nan -- [1.0 0N 3.0] slot 1
compile/f64_cast_i64_null_slot_is_nan -- (as 'F64 [1 0N 3]) slot 1
Both assert d[1] != d[1] (NaN by IEEE-754) directly on the raw payload.
The grouped aggregation path (da_accum_row + scalar_accum_row) read raw F64 payloads without consulting the nullmap. Pre-Phase 2 (and pre-2e), null F64 slots happened to hold 0.0 - the additive identity - so SUM returned the right answer by coincidence. Phase 2e correctly writes NaN to null F64 slots, which surfaces the latent bug. This commit makes da_accum_row's F64 SUM / AVG / STDDEV / VAR / MIN / MAX / FIRST / LAST paths skip NaN payloads via `if (v == v)` guards, honoring the Phase 2 dual-encoding contract. scalar_accum_row receives the mirrored fix since its dispatch is structurally identical. Also patches two sibling VAR/STDDEV insufficient-count producer sites (group.c:2339, group.c:6462) that Phase 2e missed - they follow the same v = 0.0; <set_null> pattern as Site 1 (line 2570) and now write NULL_F64 for slot consistency. Known follow-up (out of scope): grouped AVG / VAR / STDDEV use count[gid] as the divisor, which still includes null rows. Fixing this requires per-(group, agg) non-null counts - separate Phase 3 work.
Final mechanical pass on F64 null producer sites. Patches: - src/ops/group.c:7115-7135 — decomposed VAR/STDDEV finalization (4 sites) - src/ops/group.c:2277-2298 — parallel radix_phase3_fn F64 key null scatter - src/ops/query.c (3 sites) — UPDATE I64-typed-null broadcast into F64 column - src/ops/linkop.c — gather/deref NaN-fills F64 null result slots All follow the established Phase 2 pattern: alongside the existing nullmap bit, also write NULL_F64 to the data payload so consumers reading raw slots honor the dual-encoding contract. Updates the NULL_* comment in include/rayforce.h to flag known Phase 3 work that Phase 2 deliberately does NOT address: - Grouped AVG/VAR/STDDEV use count[gid] as divisor (includes nulls) - Grouped FIRST/LAST/MIN/MAX/PROD on all-null F64 groups leak sentinel values (DBL_MAX, -DBL_MAX, 0.0, or NaN-poisoned product) Both classes require per-(group, agg) non-null counting and result-side null finalization — work that aligns naturally with the Phase 7 bitmap removal where every consumer must be updated anyway.
Dual-encode null F64 values as NaN payload + nullmap bit so consumers that read raw payload honor the null contract without consulting the bitmap. 8-commit slice covering CSV ingest, ray_typed_null, UPDATE cast + broadcast, mixed-numeric literals, group-by/pivot keys, linkop deref, and the grouped-aggregation NaN-skip consumer fix. Known Phase 3 semantic gaps documented in include/rayforce.h. # Conflicts: # test/test_atom.c
ray_typed_null(-RAY_I16/I32/I64/DATE/TIME/TIMESTAMP) now stores the
correct-width INT_MIN sentinel in the matching union field alongside
nullmap[0]&1. Mirrors Phase 2a's F64 NaN treatment.
Adds atom/typed_null_{i16,i32,date,time,timestamp} unit tests and
updates atom/typed_null_i64 to assert NULL_I64 (was 0).
Group-by aggregation regressions surfaced because the SUM/AVG hot
paths in ops/group.c (da_accum_row, scalar_accum_row, scalar_sum_*
specialised loops, sp_eligible radix HT macros) read agg slots raw,
so the new NULL_I* sentinels in null rows poisoned the sums. Mirror
the existing F64 NaN-skip pattern:
* da_ctx_t / scalar_ctx_t carry a per-agg int-null mask plus a
per-agg sentinel value precomputed from the column type.
* Hot SUM/AVG/STDDEV/VAR/PROD/FIRST/LAST/MIN/MAX arms gate the
integer accumulator with the sentinel check (zero overhead when
the source vec has no nulls — the mask bit is clear and short-
circuit eats the comparison).
* The radix HT / dense range fast paths bail to a slower masked
path when any agg column advertises nulls.
* The scalar dispatch fast path (scalar_sum_i64_fn /
scalar_sum_f64_fn) similarly skips the tight loop when nulls
are present.
* try_linear_sumavg_input_i64 refuses to build a linear plan when
any term column has nulls, so scalar_sum_linear_i64_fn never
sees sentinels.
All I16/I32/I64/DATE/TIME/TIMESTAMP columns now write the correct-
width sentinel into the payload alongside the nullmap bit, in both
parallel and serial parse paths, in both the truncated-row fill and
the per-row parse case.
Mirrors Phase 2bc's F64 NaN treatment.
Adds csv/null_{i16,i32,date,time,timestamp} unit tests asserting
the slot value (in addition to the nullmap bit).
Generalizes the Phase 2e F64 post-cast NaN-fill to all integer and
temporal destination types (I16/I32/I64/DATE/TIME/TIMESTAMP). Casts
that produce a nullable result now write the matching INT_MIN
sentinel into each null payload slot.
Critically handles Hazard 3 (width-cast truncation):
(int16_t)NULL_I32 = 0 is bypassed — the destination-width sentinel
is written post-cast, not propagated through the cast macro.
Adds compile/{i32_cast_i64,i16_cast_i32,i64_cast_i32}_null_slot_is_sentinel
regression tests asserting the slot value after each cast direction.
Generalizes the Phase 2e F64 NaN-fill in set_all_null to all integer and temporal result types so binary ops with a scalar-null operand write the matching NULL_I64 / NULL_I32 / NULL_I16 sentinel into every payload slot alongside the existing nullmap bits. Also closes a parallel hole in store_typed_elem (lang/internal.h), which is the actual hot path for the tree-walker's slow per-element binary loop. It was memset(0)ing the payload at null positions, violating the dual-encoding contract for I-typed and F64 outputs. The new test (+ 0Nl [1 2 3]) routes through this path because the DAG/exec_elementwise_binary executor bails out whenever a scalar operand is a null atom (see eval.c:709 can_dag = 0). Adds compile/i64_scalar_null_propagation_slot_is_sentinel regression exercising (+ 0Nl [1 2 3]).
The numeric-promotion overlay in UPDATE-WHERE (query.c:8228) casts
I64<->F64 and I32<->F64 values then propagates nullmap bits via
ray_vec_set_null. After Phase 3a-2/3/4 the source-side sentinel is
no longer 0 - it's NaN for F64 and INT_MIN for integers - and
casting it produces implementation-defined garbage in the slot.
Extend the null-propagation loop to write the destination's
correct-width sentinel (NULL_F64 / NULL_I64 / NULL_I32 / NULL_I16)
alongside ray_vec_set_null, matching the dual-encoding contract.
Adds compile/update_promo_{f64_to_i64,i64_to_f64}_null_slot regressions.
Extend Phase 2g's 3 RAY_ATOM_IS_NULL broadcast sites in query.c to also fill the destination payload with the correct-width INT_MIN sentinel for I16/I32/I64/DATE/TIME/TIMESTAMP columns. Mirrors the canonical 4-arm switch established by Tasks 3, 4, 5. Adds compile/update_atom_broadcast_i64_null_slot regression(s).
Extend Phase 2g's F64-key fills in group.c serial finalization and the parallel radix_phase3_fn key scatter to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots. Adds compile/group_by_i64_null_key_slot regression.
Extend Phase 2g's F64-key fill in pivot.c to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/TIMESTAMP key slots. Adds compile/pivot_i64_null_key_slot regression.
Extend Phase 2g's F64 NaN-fill in linkop.c gather/deref to write the correct-width INT_MIN sentinel into null I16/I32/I64/DATE/TIME/ TIMESTAMP result slots.
Task 10's audit found grpt phase-3 emit at group.c:9176 wrote 0 to null key slots instead of the canonical INT_MIN sentinel per type. Apply the same 4-arm switch used by the post-radix key scatter so TOP_N/BOT_N result columns also satisfy the Phase 3a dual-encoding contract.
End-to-end coverage that null I16/I32/I64 slots hold the matching INT_MIN sentinel AND set the nullmap bit, across CSV ingest, atom, UPDATE cast/broadcast, sort, distinct, aggregations, group-by SUM (consumer fix), and cast width preservation.
Integration review surfaced ~16 producer sites missed by both Phase 2
and Phase 3a's enumerated surface. All write 0/0.0/uninitialized to
the payload while calling ray_vec_set_null — bitmap-only nulls that
violate the dual-encoding contract.
Sites patched:
* src/ops/temporal.c (6 sites): extract/trunc family I64/TIMESTAMP
output fills NULL_I64
* src/ops/string.c, src/ops/strop.c (5 sites): strlen fills NULL_I64
* src/ops/expr.c: mark_i64_overflow_as_null no longer overwrites the
INT64_MIN sentinel with 0 (it IS the sentinel post-Phase 3a-1)
* src/ops/group.c: median_per_group empty-group fills NULL_F64
(Phase 2 gap)
* src/ops/window.c: zero-memset of result vectors replaced with
sentinel-fill so every slot is pre-stamped null
* src/ops/builtins.c: ray_group key-vec null bucket + mixed-numeric
list builtin both fill width-correct sentinel
Adds regression tests for the most observable cases.
Extend the Phase 2 F64 dual-encoding contract (null payload + nullmap bit) to integer storage types: I16, I32, I64 and the temporal aliases DATE, TIME, TIMESTAMP. 13 commits covering ray_typed_null, CSV parser, cast_vec_copy_nulls, set_all_null / store_typed_elem, UPDATE atom broadcast + WHERE numeric promo, group-by keys (serial + parallel + TOP_N), pivot, linkop, plus a cleanup sweep through temporal / strlen / window / median / group builtin / overflow handler. Grouped-aggregation consumer (da_accum_row + scalar_accum_row) gained per-agg has_nulls mask + width-correct sentinel-compare guards in SUM/AVG/STDDEV/VAR/PROD/MIN/MAX/FIRST/LAST integer arms. Hazard: a user-stored INT_MIN in a HAS_NULLS column is silently dropped — acceptable under dual encoding since the bitmap is source of truth; sentinel-free representation needed at Phase 7 cutover. Phase 3 follow-up (out of scope): grouped AVG / VAR / STDDEV use count[gid] as divisor without excluding null rows; FIRST / LAST / MIN / MAX / PROD on all-null groups leak sentinel values. Both require per-(group, agg) non-null counts and result-side null finalization.
Closes the documented Phase 3 follow-up gaps in include/rayforce.h's NULL_* block. Grouped aggregations in src/ops/group.c now correctly handle the dual-encoding nulls that Phase 2 (F64 NaN) and Phase 3a (integer sentinels) plumbed through the producer surface. The fix adds an `nn_count[gid * n_aggs + a]` array to the DA / scalar accumulator, allocated only when at least one agg input column has HAS_NULLS set (so null-free queries pay zero memory overhead). The counter is incremented inside the existing F64 NaN-skip and integer sentinel-skip branches, then drives finalization: - AVG / VAR / STDDEV use nn_count[idx] as the divisor (and the insufficient-count gate) instead of count[gid], so null rows no longer dilute the mean. - MIN / MAX / PROD / FIRST / LAST emit a typed null (NULL_F64 + nullmap bit for F64 outputs, NULL_I64 for integer outputs) when nn_count[idx] == 0, instead of leaking DBL_MAX / -DBL_MAX / 0 / the product seed for an all-null group. - FIRST / LAST no longer advance acc->first_row[gid] when the row's agg value is null — a null prefix would have blocked every later non-null row from winning the FIRST race. PROD's first-row write is similarly guarded so the seed-vs-multiply branch keys off "first non-null seen" rather than count==1. Multi-FIRST/LAST limitation (per-(group, agg) first_row would fix it) is documented inline; out of scope for this phase. Tests: rfl/null/grouped_agg_null_correctness covers AVG divisor, all-null MIN/MAX/PROD/FIRST/LAST, F64 + integer variants, null-prefix FIRST/LAST, scalar (no by:) path, and STDDEV/VAR insufficient-count. Full suite: 2449/2450 passing (1 skipped, was 2448/2449). Updates the NULL_* documentation paragraph to mark the gaps closed.
…zation Closes the deferred Phase 3 gaps documented in include/rayforce.h after Phase 3a: - AVG / VAR / STDDEV divisor was count[gid] including null rows. Now uses per-(group, agg) nn_count, incremented only when the agg row is non-null per the existing F64 NaN-skip / integer sentinel-skip guards. - MIN / MAX / PROD / FIRST / LAST on all-null groups now emit typed null at finalization instead of leaking accumulator sentinels (DBL_MAX / -DBL_MAX / sum=0 / product seed). - FIRST / LAST no longer advance first_row[gid] / last_row[gid] when the row's value is null — preserves first/last *non-null* semantics. Adds test/rfl/null/grouped_agg_null_correctness.rfl regression. Doc paragraph in include/rayforce.h updated to mark these gaps closed. Known remaining limitation (out of scope, documented inline): multi-FIRST/LAST aggs share first_row[gid] across aggs on the same group; would require per-(group, agg) first_row arrays to close.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multi-phase migration of the null encoding from bitmap-only to dual-encoded (sentinel value in payload + nullmap bit). Foundation for the eventual Phase 7 bitmap-arm reclamation.
NaNalongside the nullmap bit. Covers CSV / atom / UPDATE cast + broadcast / mixed literals / group keys / pivot / linkop / set_all_null / cast_vec_copy_nulls. Grouped-aggregation consumer (da_accum_row) gained NaN-skip guards in SUM/AVG/STDDEV/VAR/MIN/MAX/FIRST/LAST F64 arms.Hazards documented in `include/rayforce.h`
Stats
Test Plan