feat: alternate PRQL-native implementation of append by:name#6046
feat: alternate PRQL-native implementation of append by:name#6046kgutwin wants to merge 12 commits into
append by:name#6046Conversation
prql-bot
left a comment
There was a problem hiding this comment.
Nice work — the _append_by_name formulation in std.prql is a clever, fully dialect-independent approach, and the column-alignment via tuple_uniq reads cleanly.
CI is red because Clippy runs with -D warnings and flags two issues, both in this diff (see inline suggestions):
expr: &Box<pl::Expr>triggersclippy::borrowed_box—&pl::Exprworks via deref coercion at the call site.- The trailing
return Err(...)inresolve_ident_wildcardtriggersclippy::needless_return.
A couple of non-blocking notes for whenever you pick this back up (the PR description already flags docs/lineage as pending):
tuple_reverseis added tostd.prqland the resolver but, per the description, isn't used by anything. Adding it permanently expands the language surface speculatively — worth dropping it unless there's a concrete consumer, since it'd otherwise need docs/tests of its own.- Several
log::trace!/log::debug!calls and the newModule::all_nameshelper ("Useful for debugging") look like instrumentation left over from development (e.g. the per-ident... candidatesdump inexpr.rs, thefind_input_by_nametrace inmodule.rs, the tuple-intersection trace intypes.rs). They're trace-gated so harmless at runtime, but worth pruning the ones that were just for bring-up before merge.
|
@max-sixty This PR should be ready for your review -- thanks! |
|
hi @kgutwin ! I had a look, would like to spend more time looking, but also trust you so don't feel I need to understand every line. I did lean on an LLM to poke at parts, a few cases we found below. It's possible that we are missing something though! Let me know if so... 1. An unnamed column is silently dropped. compiles to (note SELECT x, NULL AS y FROM a
UNION ALL
SELECT x, y FROM bPlain 2. Unknown schemas silently fall back to a position-union. gives plain 3. One side defined and one not gives a confusing internal error. errors with All three trace to one gap: nothing checks that both relations have fully-defined, named columns before delegating to |
Retracting this approval. @max-sixty's review above documents three correctness bugs this happy-path read missed: an unnamed column silently dropped, by:name silently falling back to a position-union when schemas are unknown, and a confusing internal error when one side's columns are undefined. They all trace to nothing checking that both relations have fully-defined, named columns before delegating to _append_by_name. Happy to re-review once those are addressed.
|
Sounds fine, I added some more input validation to the |
prql-bot
left a comment
There was a problem hiding this comment.
The new validation cleanly closes all three cases from the earlier feedback: unnamed columns (x + 1), wildcard inputs (from b), and the asymmetric one-defined-one-not case now all produce clear, span-anchored errors instead of silent-wrong or internal-leak output. Nice.
One small inconsistency in the missing-lineage branch — see inline.
Co-authored-by: prql-bot <107324867+prql-bot@users.noreply.github.com>
codecov/patch went red after this approval — patch coverage is 82.56% (34/195 new lines uncovered), below the auto target. The feature's happy path (test_append_by_name) and error paths (append_by_name_wildcard / append_by_name_unnamed) are covered; the gap is concentrated in the new general wildcard-resolution paths — construct_wildcard_from_lineage in expr.rs (~60%) and the second-pass lookup in resolve_ident_wildcard in names.rs (~80%), whose defensive match arms aren't exercised, plus the leftover trace instrumentation the earlier review flagged for pruning. Not a hard merge gate (codecov isn't in check-ok-to-merge), but withdrawing the empty-body approval so it doesn't sit over the red check — the earlier review remains the active verdict.
See #5165 for background discussion.
This PR adds a PRQL-native implementation of
append by:namewhich is dialect-independent. This is a counterpart (but not exclusive to) PR #6037 which would add dialect-specific support forUNION ALL BY NAME.This PR has three main categories of changes:
std._append_by_namefunction was added, and the internal definition ofappendwas extended to parse theby:named argument. Whenby:nameis provided, it generates aFuncCallexpression to invokestd._append_by_name.semantic/resolver/expr.rsfor wildcard expansion of relations via their lineage information. This seemed to be necessary in order to implement_param.bottom.*within thestd._append_by_namefunction.tuple_uniqandtuple_reverse. I ended up not needing to usetuple_reversebut I left it in anyway in hopes that someone will find it useful.tuple_uniqtakes a tuple which may have duplicate entries (by alias or by Ident name) and returns either the earliest or latest seen instance of a given name, depending on the value of thetake:named argument.Updates to the documentation are still pending, and there's also a potential question related to lineage that I may work on exploring.