Skip to content

fix(query/planner): route rule bodies through the IR pipeline#826

Merged
whilo merged 6 commits into
mainfrom
fix/rule-body-ir-pipeline
May 12, 2026
Merged

fix(query/planner): route rule bodies through the IR pipeline#826
whilo merged 6 commits into
mainfrom
fix/rule-body-ir-pipeline

Conversation

@whilo
Copy link
Copy Markdown
Member

@whilo whilo commented May 12, 2026

Summary

Kernel correctness fixes for recursive rules + an architectural cleanup of the planner/lowering split. Builds on #825 (allow function-only base cases in recursive rules).

Correctness fixes

1. Rule bodies through the IR pipeline. plan-rule-op planned each rule branch's body via create-plan directly, skipping the logical IR pass that converts [(get-else $ ?e :attr default) ?v] into an LOptionalScan. At top level the logical pass runs (via create-plan-via-ir) and the optional scan binds ?e via an attribute scan, so subsequent clauses can use it. Inside a rule body the promotion never happened — ?e stayed free and get-else / predicate clauses referencing it produced nil outputs.

2. delta-driven-expand tighter eligibility. execute-recursive-rule's fast path skipped semi-naive evaluation for "simple transitive closure" rule shapes. The eligibility check fired for any 2-head-var recursive rule with an indexed attr + a DB pattern, regardless of what else the rec body contained. So rules with :function ops (arithmetic), :predicate ops (termination guards), or :attached-preds on entity-groups were silently bypassed. WITH RECURSIVE CTEs with depth-counting or nullable-column projections produced wrong tuples. Added a rec-shape-simple? guard requiring every op to be :rule-lookup, :pattern-scan, or :entity-group (with no :attached-preds).

Architecture cleanup

3. All sub-plans route through the IR pipeline. plan.cljc carried two parallel paths for building sub-plans: the legacy physical-only create-plan (called for OR / NOT / AND branches) and the IR pipeline (build-logical-plan → lower) used by rule bodies and every top-level query. The legacy path silently bypassed every logical IR feature for its sub-plans. Introduced plan-via-ir as the single entry point and routed all four sub-plan sites through it.

4. Deleted the legacy create-plan entry point. The only non-recursive callers were two test helpers in query-ir-test.clj that cross-validated the IR path during the migration. With every public caller already on create-plan-via-ir, create-plan is unreachable; deleted along with the helpers and the two equivalence-tests. Replaced with a structural smoke test (test-lowered-entity-group-shape).

5. Broke the plan↔lower namespace cycle. Moved the recursion-bearing factories — plan-or-op, plan-not-op, plan-rule-op, plus plan-via-ir, normalize-and-plan-branches, rename-branch-vars — out of plan.cljc into lower.cljc, where the recursion target is a sibling fn in the same ns. plan.cljc is now stateless physical-op factories only; it no longer references lower (no requiring-resolve, no :as logical import). Dependency is one-way: lower → plan, lower → logical.

Test plan

  • datahike :clj-pss suite under DATAHIKE_QUERY_PLANNER=true: 1489 tests, 7142 assertions, 0 failures
  • New test-get-else-inside-rule-body covers the rule-body get-else regression
  • New test-lowered-entity-group-shape covers the lowering output structure
  • End-to-end via pg-datahike: 2-col WITH RECURSIVE walk(id, depth) AS … returns {[1 0] [2 1] [3 1] [4 2] [5 2]} (regression for Tx middleware bug fix #2)
  • Existing test-rules, test-rule-arguments, test-recursive-rules still pass

Commit list

  1. fix(query/planner): route rule bodies through the IR pipelinetransactional schema middleware #1 above
  2. chore(query/planner): use direct require for logical, requiring-resolve only for lower — small dep cleanup at the time (later subsumed by commit 6 which drops the logical require entirely)
  3. fix(query/execute): tighten delta-driven-expand eligibility for recursive rulesTx middleware bug fix #2 above
  4. refactor(query/planner): route every sub-plan through the IR pipelineQuestion: Will datahike allow arbitrary "forking" of db? #3 above
  5. refactor(query/planner): drop legacy create-plan entry pointHow to specify a schema #4 above
  6. refactor(query): break the plan↔lower namespace cycleStoring java.math.BigDecimal silently "disconnects" from storage #5 above

whilo added 3 commits May 12, 2026 13:37
Top-level queries get `[(get-else $ ?e :attr default) ?v]` promoted to
an `LOptionalScan` (synthetic pattern scan) by the logical IR pass in
`logical.cljc`. That promotion is what binds `?e` via an attribute
scan so subsequent get-else / predicate clauses on the same `?e` work
as expected.

Rule body planning in `plan-rule-op` bypassed the logical IR pass —
it called `create-plan` (physical-only) directly on each branch. So
inside a rule body the get-else recognition never fired, `?e` stayed
unbound, and bodies like

  [(maybe-dep ?id ?dep)
   [(get-else $ ?e :concept/deprecated false) ?dep]
   [?e :concept/id ?id]]

returned `#{[nil false]}` instead of the actual concept rows. Same
behavior in both base and recursive branches.

Fix: in both the recursive (`base-ps` + `rec-cvs`) and non-recursive
(`sub-plans`) paths, replace `create-plan` with
`build-logical-plan → lower`. `requiring-resolve` breaks the
plan↔lower cycle (lower.cljc already requires plan.cljc, so plan can't
require lower at load time).

Side effect: every other logical IR feature (predicate pushdown,
LScan / LOptionalScan recognition, future LFilter improvements) now
applies inside rule bodies too, not just top-level queries.

  - datahike :clj-pss suite: 1504 tests, 7196 assertions, 0 failures
  - New regression test in query-rules-test covers both the scalar
    get-else-in-rule-body shape and a predicate on the get-else output.
…ve only for lower

The previous K1 commit on this branch used `requiring-resolve` for both
`build-logical-plan` and `lower` to be safe. Only `lower` actually
needs it: `lower.cljc` requires `plan.cljc` (uses plan-pattern-op,
assemble-entity-group, build-pipeline, plan-rule-op), so plan→lower
forms a cycle. `logical.cljc` has no dependency on plan, so plan can
require it directly.

Follow-up tracked separately: deprecate the legacy `create-plan` path
so plan-rule-op can move to lower.cljc and the cycle dissolves
entirely.
…sive rules

`execute-recursive-rule` has a fast-path optimization,
`delta-driven-expand`, that bypasses semi-naive evaluation for "simple
transitive closure" rule shapes: for each `(t-val, y-val)` in the
previous delta, reverse-AVET-lookup `[?x :attr t-val]` and emit
`(?x, y-val)`. That's exactly right for

  [(follows ?x ?y) [?x :follows ?y]]
  [(follows ?x ?y) [?x :follows ?t] (follows ?t ?y)]

— the recursive body's only step beyond the rule-lookup is the reverse
pattern, and emitting `[entity-id, propagated-y]` is the correct
fixpoint step.

The eligibility check, however, fired for ANY 2-head-var recursive
rule with an indexed attr + a DB pattern in the rec body, regardless
of what ELSE the rec body contained. So rules with:

  - `:function` ops (arithmetic on a counter, get-else for nullable
    columns, sql-+, identity for ground constants),
  - `:predicate` ops (termination guards, joins between get-else
    outputs),
  - `:attached-preds` on entity-groups (post-K1 fully-attached
    predicates from the IR pipeline)

were silently *skipped*. The shortcut emitted `[entity-id, prev-y]`
where the actual semantics would have computed `[node-value,
incremented-depth]` or similar. Failure mode: WITH RECURSIVE CTEs
with arithmetic (`depth+1`) produced wrong tuples; pg-datahike's
nullable-column-translation `[(get-else $ ?e :col :__null__) ?v]`
combined with the binary head-var shape made every multi-column-
projection recursive CTE silently degrade to entity-ids + zero depth.

Fix: add a `rec-shape-simple?` guard that requires every op in every
rec-clause-version to be exactly `:rule-lookup`, `:pattern-scan`, or
`:entity-group` (with no attached-preds). Anything else routes through
the normal semi-naive path (`execute-branch-plans`) — slower but
correct.

  - All datahike :clj-pss tests: 1492 / 7175 / 0 failures.
  - The 2-col recursive CTE that surfaced this (walk(id, depth) over
    a tree of nodes) now returns the expected `{[1 0] [2 1] [3 1]
    [4 2] [5 2]}` instead of `{[1 0] [7 0] [4 0]}`.
@whilo whilo force-pushed the fix/rule-body-ir-pipeline branch from 1d8aed2 to 353db3e Compare May 12, 2026 21:51
whilo added 3 commits May 12, 2026 15:13
`plan.cljc` carried two parallel paths for building sub-plans:

  - the legacy physical-only `create-plan` (called recursively for OR
    branches via `normalize-and-plan-branches`, for NOT sub-clauses in
    `plan-not-op`, and for AND sub-clauses inside `create-plan`'s own
    reduce);
  - the IR pipeline `build-logical-plan → lower` introduced by K1 for
    recursive-rule bodies and used by every top-level query via
    `create-plan-via-ir` in query.cljc.

The legacy path silently bypassed every logical IR feature for the
sub-plans it built — most visibly `[(get-else $ ?e :attr default) ?v]`
not being promoted to `LOptionalScan`, so an OR branch or NOT body
that used get-else would see `?e` unbound (parallel to the rule-body
gap K1 fixed).

This commit consolidates all sub-plan construction on the IR
pipeline. Introduces a single helper `plan-via-ir` that wraps
`build-logical-plan → lower` with the bound-vars coercion the
existing K1 callsites already needed, and routes the four sub-plan
sites through it:

  - `normalize-and-plan-branches`        (OR / OR-JOIN branches)
  - `plan-not-op`                        (NOT / NOT-JOIN sub-plan)
  - `create-plan`'s `:and` reduce case   (inline AND expansion)
  - `plan-rule-op`'s recursive + non-recursive branch construction
    (consolidates the two K1 callsites into the same helper)

`requiring-resolve` on `lower` is still needed — plan→lower remains a
load-order cycle. A follow-up commit deletes `create-plan` itself; a
further follow-up moves the recursion-bearing factories into
lower.cljc and removes the resolve entirely.

  - 1492 tests, 7175 assertions, 0 failures under :clj-pss.
`create-plan` (plan.cljc) was the legacy physical-only entry point —
it ran its own clause-classifier + dispatch loop and bypassed the
logical IR pass. The only callers were:

  - The four sub-plan recursion sites inside plan.cljc itself (OR /
    NOT / AND branches, plan-rule-op's branch bodies) — all rerouted
    to the IR pipeline (`plan-via-ir`) in the previous commit.
  - Two test helpers in `query-ir-test.clj` (`plan-match?` and the
    `:old-eg` half of `test-lower-deep-structural-equality`) that
    used create-plan to cross-validate the IR path during the
    migration.

Every user-facing query routes through `create-plan-via-ir` in
query.cljc, which has always called `build-logical-plan → lower` and
never touched create-plan. So removing create-plan is a no-op for
public callers.

  - Delete the `(declare create-plan)` and 280-line `(defn create-plan
    ...)` body from plan.cljc.
  - Delete `plan-match?` helper, `test-lower-matches-create-plan`, and
    the create-plan half of `test-lower-deep-structural-equality` from
    query-ir-test.clj. The replacement
    (`test-lowered-entity-group-shape`) asserts the structural fields
    callers actually depend on, sans cross-pipeline comparison.
  - Update `test-ir-pipeline-with-in-bindings` to a direct IR-plan
    smoke test now that `plan-match?` is gone.

  - 1489 tests, 7142 assertions, 0 failures under :clj-pss.

The plan↔lower load-time cycle still exists: `plan-via-ir`,
`plan-or-op`, `plan-not-op`, and `plan-rule-op` all use
`requiring-resolve` on `lower/lower`. The next commit moves those
recursion-bearing factories into lower.cljc so plan no longer
references lower at all.
`lower.cljc` requires `plan.cljc` for the physical-op factories
(`plan-pattern-op`, `assemble-entity-group`, `build-pipeline`, etc.).
plan.cljc used to require lower back at runtime — via
`requiring-resolve` on `lower/lower` — because `plan-or-op`,
`plan-not-op`, and `plan-rule-op` all recurse on sub-plans through
the full IR pipeline (`build-logical-plan → lower`).

The cycle was structural: plan held both "stateless physical-op
factories" AND "recursion-bearing sub-plan factories". The first
group is what lower needs; the second needs lower itself. So they
fought.

This commit moves the recursion-bearing factories out of plan.cljc
into lower.cljc, where the recursion target is just a sibling
function in the same namespace:

  Moved from plan → lower:
    plan-via-ir                  (private helper for IR re-entry)
    normalize-and-plan-branches  (private; OR branch shape coercion)
    plan-or-op                   (public)
    plan-not-op                  (public)
    rename-branch-vars           (private; rule body var renaming)
    plan-rule-op                 (public)

`plan-via-ir` now calls `lower` directly (forward-declared via
`declare lower` inside lower.cljc). No more `requiring-resolve`. No
more `:as logical` import in plan.cljc.

Stays in plan.cljc (used by lower across the one-way dep):
    plan-pattern-op, assemble-entity-group, build-pipeline,
    plan-function-op, plan-predicate-op, plan-rule-lookup-op,
    plan-passthrough-op, structurally-fusable?,
    post-op-direct-eligible?, order-plan-ops, compute-rule-sccs
    (and its helpers rule-call-graph, tarjan-scc), op-required-vars,
    op-cost, replan, …

Net effect:
  - plan.cljc no longer requires lower (no requiring-resolve, no
    runtime cycle).
  - lower.cljc requires plan and logical, both one-way.
  - 1489 / 7142 / 0 failures under :clj-pss (no behavioural change).

Closes the architecture cleanup discussed in the PR thread — the
"deprecate create-plan" option from the audit, with the further step
of moving the now-load-time-cycle-only factories to their natural
home.
@whilo whilo merged commit 2ad28fd into main May 12, 2026
11 of 12 checks passed
@whilo whilo deleted the fix/rule-body-ir-pipeline branch May 12, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant