[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782
[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782XdithyX wants to merge 3 commits into
Conversation
|
Hi @cloud-fan and @sarutak, could you please take a look? Thanks. |
|
@XdithyX , thank you for fixing this issue. Please fix this issue after #54972 merges, because this issue was found during this PR. To keep this already big PR focused and fix it everywhere (including the code that is not part of #54972) I opened this Jira item (SPARK-56638) to track a follow up fix. |
|
CI has now passed. Please let me know if anything else is required. |
|
The goal of this JIRA item is to deprecate the usage of "_LEGACY_ERROR_TEMP_0064" in TABLESAMPLE code path and likely other code paths that depend on ParserUtils.scala. For example, in test("SPARK-55978: TABLESAMPLE SYSTEM - fraction out of range") in PlanParserSuite.scala. Starting from that call site, there is a call chain of -> AnalysisTest.scala -> AbstractSqlParser.scala -> AstBuilder.scala -> validate() function in ParserUtils.scala that is shared by both the existing Bernoulli sampling and the new System sampling. Fixing this will need code changes to 10+ call sites and 20+ test cases. That is a super set of TABLESAMPLE code path. Fixing them all at once is the best approach. I don't think your PR currently covers the full fix. |
|
@stanyao Thanks for clarifying. I originally scoped this PR to the TABLESAMPLE fraction validation path only, but I understand now that the expected fix is to eliminate I’ll update the PR to cover the full set of validate() call sites and their tests/golden files. |
What changes were proposed in this pull request?
This PR replaces the legacy parser error class used by invalid
TABLESAMPLEfractions with a named error class (SPARK-56638).Specifically, this PR:
INVALID_TABLESAMPLE_FRACTIONtoerror-conditions.jsonQueryParsingErrors.invalidTableSampleFractionErrorParserUtils.validate(...)call inAstBuilder.withSample(...)for TABLESAMPLE fraction validationPlanParserSuiteto assert the new error classThe validation logic is unchanged.
TABLESAMPLEfractions are still required to be in the[0, 1]interval, allowing the existing rounding epsilon.PlanParserSuite now covers both existing TABLESAMPLE fraction validation and the TABLESAMPLE SYSTEM fraction validation added by #54972.
Why are the changes needed?
ParserUtils.validate(...)always throws_LEGACY_ERROR_TEMP_0064, which is a generic legacy error class.For invalid
TABLESAMPLEfractions, the parser already knows the specific failure: the computed sampling fraction is outside the allowed[0, 1]interval. Using a named error class makes the error condition explicit and continues the ongoing cleanup of legacy temporary error classes.This PR does not update
ParserUtils.validate(...)itself because that helper is shared by unrelated parser validations. Making it throwINVALID_TABLESAMPLE_FRACTIONwould incorrectly label non-TABLESAMPLE parser failures as TABLESAMPLE fraction errors.The new error uses SQLSTATE
22023because the statement is syntactically valid, but the supplied numeric value is invalid/out of range. This is consistent with existing Spark error conditions for invalid argument or range values, such asINVALID_NUMERIC_LITERAL_RANGEand other22023value-validation errors, rather than42601, which is used for syntax errors.Does this PR introduce any user-facing change?
Yes, for invalid
TABLESAMPLEfractions, the error class changes from_LEGACY_ERROR_TEMP_0064toINVALID_TABLESAMPLE_FRACTION.How was this patch tested?
Ran build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "sampled relations"'
build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "TABLESAMPLE SYSTEM - fraction out of range"'
Also regenerated and verified the affected SQL golden files:
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"'
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'
build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"' 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'
Was this patch authored or co-authored using generative AI tooling?
Yes. Generated-by: OpenAI GPT-5.5 Codex