Skip to content

[SPARK-55372][SQL] Fix SHOW CREATE TABLE for tables / views with default collation#54159

Closed
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:fix_show_create_table
Closed

[SPARK-55372][SQL] Fix SHOW CREATE TABLE for tables / views with default collation#54159
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:fix_show_create_table

Conversation

@ilicmarkodb
Copy link
Copy Markdown
Contributor

@ilicmarkodb ilicmarkodb commented Feb 5, 2026

What changes were proposed in this pull request?

Fixed SHOW CREATE TABLE for tables / views to correctly print DEFAULT COLLATION collationName.
For example: CREATE TABLE t (c1 STRING) DEFAULT COLLATION UTF8_LCASE. Previously, it was printing COLLATE 'UTF8_LCASE', which produces a parsing error.

For UTF8_BINARY collated / non collated columns (for example, c1), the output of SHOW CREATE TABLE should print c1 STRING COLLATE UTF8_BINARY, so that we don’t inherit the collation from the table or schema, if defined.

To achieve this, I changed typeName in StringType to print COLLATE UTF8_BINARY for explicitly collated UTF8_BINARY columns. For non-collated StringType (case object), typeName does not print COLLATE UTF8_BINARY, which matches the old behaviour.

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

Yes, corrects SHOW CREATE TABLE command.

How was this patch tested?

show-create-table.sql golden file.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2026

JIRA Issue Information

=== Bug SPARK-55372 ===
Summary: Fix create table with default collation
Assignee: None
Status: Open
Affected: ["4.1.2"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions Bot added the SQL label Feb 5, 2026
@ilicmarkodb ilicmarkodb force-pushed the fix_show_create_table branch 3 times, most recently from 761a801 to fad7a52 Compare February 5, 2026 18:01
@ilicmarkodb
Copy link
Copy Markdown
Contributor Author

@dongjoon-hyun can you please take a look?
error doesn't look related to my change.

@ilicmarkodb ilicmarkodb force-pushed the fix_show_create_table branch 7 times, most recently from e720672 to cc1d409 Compare February 10, 2026 15:46
@ilicmarkodb
Copy link
Copy Markdown
Contributor Author

@cloud-fan can you please take a look?

@ilicmarkodb ilicmarkodb force-pushed the fix_show_create_table branch 6 times, most recently from 402e3c7 to ea8cd91 Compare February 11, 2026 10:44
Comment on lines +217 to +225
val stringBuilder = proto.DataType.String.newBuilder()
// Send collation only for explicit collations (including explicit UTF8_BINARY).
// Default STRING (case object) has no explicit collation and should omit it.
if (!s.eq(StringType)) {
stringBuilder.setCollation(CollationFactory.fetchCollation(s.collationId).collationName)
}
proto.DataType
.newBuilder()
.setString(
proto.DataType.String
.newBuilder()
.setCollation(CollationFactory.fetchCollation(s.collationId).collationName)
.build())
.setString(stringBuilder.build())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, a user who uses JDBC, for example, and doesn't care about collations would suddenly get COLLATE UTF8_BINARY, as in this test case:

@ilicmarkodb ilicmarkodb force-pushed the fix_show_create_table branch 4 times, most recently from 618815f to bef4d19 Compare February 11, 2026 17:27
@ilicmarkodb ilicmarkodb force-pushed the fix_show_create_table branch from bef4d19 to d0da1e4 Compare February 11, 2026 20:43
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 39fef73 Feb 16, 2026
@cloud-fan
Copy link
Copy Markdown
Contributor

@ilicmarkodb can you open a backport PR for branch 4.1?

rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
…fault collation

### What changes were proposed in this pull request?
Fixed `SHOW CREATE TABLE` for tables / views to correctly print `DEFAULT COLLATION collationName`.
For example: `CREATE TABLE t (c1 STRING) DEFAULT COLLATION UTF8_LCASE`. Previously, it was printing `COLLATE 'UTF8_LCASE'`, which produces a parsing error.

For `UTF8_BINARY` collated / non collated columns (for example, `c1`), the output of `SHOW CREATE TABLE` should print `c1 STRING COLLATE UTF8_BINARY`, so that we don’t inherit the collation from the table or schema, if defined.

To achieve this, I changed `typeName` in `StringType` to print `COLLATE UTF8_BINARY` for explicitly collated `UTF8_BINARY` columns. For non-collated `StringType` (case object), `typeName` does not print `COLLATE UTF8_BINARY`, which matches the old behaviour.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
Yes, corrects `SHOW CREATE TABLE` command.

### How was this patch tested?
`show-create-table.sql` golden file.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#54159 from ilicmarkodb/fix_show_create_table.

Authored-by: ilicmarkodb <marko.ilic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@pan3793
Copy link
Copy Markdown
Member

pan3793 commented May 9, 2026

@cloud-fan do you plan to backport this to 4.1? if so, I will close #55780 (move it target to 4.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants