[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation by XdithyX · Pull Request #55782 · apache/spark

XdithyX · 2026-05-09T15:45:23Z

What changes were proposed in this pull request?

This PR replaces the legacy parser error class used by invalid TABLESAMPLE fractions with a named error class (SPARK-56638).

Specifically, this PR:

Adds INVALID_TABLESAMPLE_FRACTION to error-conditions.json
Adds QueryParsingErrors.invalidTableSampleFractionError
Replaces the generic ParserUtils.validate(...) call in AstBuilder.withSample(...) for TABLESAMPLE fraction validation
Updates PlanParserSuite to assert the new error class

The validation logic is unchanged. TABLESAMPLE fractions are still required to be in the [0, 1] interval, allowing the existing rounding epsilon.

PlanParserSuite now covers both existing TABLESAMPLE fraction validation and the TABLESAMPLE SYSTEM fraction validation added by #54972.

Why are the changes needed?

ParserUtils.validate(...) always throws _LEGACY_ERROR_TEMP_0064, which is a generic legacy error class.

For invalid TABLESAMPLE fractions, the parser already knows the specific failure: the computed sampling fraction is outside the allowed [0, 1] interval. Using a named error class makes the error condition explicit and continues the ongoing cleanup of legacy temporary error classes.

This PR does not update ParserUtils.validate(...) itself because that helper is shared by unrelated parser validations. Making it throw INVALID_TABLESAMPLE_FRACTION would incorrectly label non-TABLESAMPLE parser failures as TABLESAMPLE fraction errors.

The new error uses SQLSTATE 22023 because the statement is syntactically valid, but the supplied numeric value is invalid/out of range. This is consistent with existing Spark error conditions for invalid argument or range values, such as INVALID_NUMERIC_LITERAL_RANGE and other 22023 value-validation errors, rather than 42601, which is used for syntax errors.

Does this PR introduce any user-facing change?

Yes, for invalid TABLESAMPLE fractions, the error class changes from _LEGACY_ERROR_TEMP_0064 to INVALID_TABLESAMPLE_FRACTION.

How was this patch tested?

Ran build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "sampled relations"'
build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "TABLESAMPLE SYSTEM - fraction out of range"'
Also regenerated and verified the affected SQL golden files:
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"'
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'
build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"' 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'

Was this patch authored or co-authored using generative AI tooling?

Yes. Generated-by: OpenAI GPT-5.5 Codex

XdithyX · 2026-05-09T15:50:20Z

Hi @cloud-fan and @sarutak, could you please take a look? Thanks.
Tagging @HeartSaVioR as well.

stanyao · 2026-05-11T23:24:17Z

@XdithyX , thank you for fixing this issue. Please fix this issue after #54972 merges, because this issue was found during this PR. To keep this already big PR focused and fix it everywhere (including the code that is not part of #54972) I opened this Jira item (SPARK-56638) to track a follow up fix.

XdithyX · 2026-05-12T12:44:14Z

@stanyao Thanks. I rebased this PR after #54972 merged and updated the new TABLESAMPLE SYSTEM out-of-range test to expect INVALID_TABLESAMPLE_FRACTION.
No other changes were required, I also regenerated and verified the affected SQL golden files.

Can you please have a look when you get a chance?

XdithyX · 2026-05-13T20:14:52Z

CI has now passed. Please let me know if anything else is required.
cc: @stanyao

stanyao · 2026-05-15T15:57:53Z

The goal of this JIRA item is to deprecate the usage of "_LEGACY_ERROR_TEMP_0064" in TABLESAMPLE code path and likely other code paths that depend on ParserUtils.scala. For example, in test("SPARK-55978: TABLESAMPLE SYSTEM - fraction out of range") in PlanParserSuite.scala. Starting from that call site, there is a call chain of -> AnalysisTest.scala -> AbstractSqlParser.scala -> AstBuilder.scala -> validate() function in ParserUtils.scala that is shared by both the existing Bernoulli sampling and the new System sampling. Fixing this will need code changes to 10+ call sites and 20+ test cases. That is a super set of TABLESAMPLE code path. Fixing them all at once is the best approach. I don't think your PR currently covers the full fix.

XdithyX · 2026-05-15T17:50:02Z

@stanyao Thanks for clarifying. I originally scoped this PR to the TABLESAMPLE fraction validation path only, but I understand now that the expected fix is to eliminate _LEGACY_ERROR_TEMP_0064 from all current ParserUtils.validate(...) call sites, including TABLESAMPLE SYSTEM/Bernoulli and the other parser validations that share the helper.

I’ll update the PR to cover the full set of validate() call sites and their tests/golden files.

…ion validation

zhengruifeng mentioned this pull request May 11, 2026

[INFRA] Share SBT precompile artifact with JVM build matrix #55762

Draft

XdithyX force-pushed the SPARK-56638 branch from 01eb7be to cf8966d Compare May 12, 2026 12:32

XdithyX added 3 commits May 17, 2026 00:17

[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fract…

c8b2209

…ion validation

[SPARK-56638][SQL] modify sql.out files of SQLQueryTestSuite

b5b001b

covering full set of validate() call sites

99181d7

XdithyX force-pushed the SPARK-56638 branch from cf8966d to 99181d7 Compare May 16, 2026 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782

[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782
XdithyX wants to merge 3 commits into
apache:masterfrom
XdithyX:SPARK-56638

XdithyX commented May 9, 2026 •

edited

Loading

Uh oh!

XdithyX commented May 9, 2026

Uh oh!

stanyao commented May 11, 2026

Uh oh!

XdithyX commented May 12, 2026

Uh oh!

XdithyX commented May 13, 2026

Uh oh!

stanyao commented May 15, 2026 •

edited

Loading

Uh oh!

XdithyX commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

XdithyX commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

XdithyX commented May 9, 2026

Uh oh!

stanyao commented May 11, 2026

Uh oh!

XdithyX commented May 12, 2026

Uh oh!

XdithyX commented May 13, 2026

Uh oh!

stanyao commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XdithyX commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XdithyX commented May 9, 2026 •

edited

Loading

stanyao commented May 15, 2026 •

edited

Loading