API: required nested fields within optional structs can produce null#13804
API: required nested fields within optional structs can produce null#13804nastra merged 12 commits intoapache:mainfrom
Conversation
|
@dejangvozdenac thanks for reporting and fixing the issue. I agree that can you add unit test in Spark module to cover the scenario you described? |
|
thanks for the review @stevenzwu, I appreciate it! I addressed all the comments, let me know if anything further is needed. |
stevenzwu
left a comment
There was a problem hiding this comment.
LGTM. We will need 2 or 3 more committers' approval, since this modifies critical api code.
awesome, thanks for the quick review @stevenzwu. what's the usually process here? should I tag 2-3 people in who have reviewed code under api or do you have suggestions? |
b489986 to
535c8a1
Compare
|
Hey @pvary , mind taking a look? @stevenzwu mentioned we need at least 3 reviewers on this. |
|
Sorry wrong tag, meant to tag @nastra |
|
@nastra might be busy. @pvary or @singhpk234 mind taking a look? |
|
@stevenzwu we have three approvals now, would this be ready to merge? |
|
hey @stevenzwu @pvary @nastra @huaxingao , gentle ping on this |
|
@nastra you want to take another look before we merge this? |
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls (#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL
…pache#13804) (cherry picked from commit c05f6c9)
…ce nulls (apache#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c)
…ce nulls (#14270) (#14512) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c) Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
|
@dejangvozdenac Thank you for fixing this issue! I've verified that this change resolves the correctness issue in Trino. A regression test is included in trinodb/trino#26640. |
…ce nulls (apache#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL
…ce nulls (apache#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL
producesNull check is used when binding isNull and notNull predicate. Currently, the logic states that a field can produce null if and only if that field is optional. This is not true, however, in the case of required fields nested within optional structs. The field itself can produce nulls if the parent struct is null despite it being required.
I'm able to reproduce this case in Trino and Spark by creating the following schema and adding rows to it:
address.street is null for row 0, but trino/spark using iceberg api disagree:
You can see that when Trino reads the entire file, it correctly determines the row:
This also leads to unexpected behavior where null or not null check behaves differently based on the binding order:
After this change, iceberg can find the null row:
Closes #13328 (and relatedly trinodb/trino#20511)