feat: Support Spark Expression Encode#4315
Conversation
|
Thanks @YutaLin. LGTM overall. Could you address feedback, then I'll kick off CI |
|
Hi @andygrove, thanks for the review! About "Spark accepts utf8 as an alias for UTF-8", spark only supports alias before 3.5, because it uses JDK https://spark.apache.org/docs/4.0.0/sql-migration-guide.html#upgrading-from-spark-sql-35-to-40
|
…atafusion-comet into 3183_support_spark_expression_encode
…expression_encode # Conflicts: # docs/source/user-guide/latest/expressions.md
coderfender
left a comment
There was a problem hiding this comment.
Left some minor comments but overall looks good @YutaLin
| binding: Boolean): Option[Expr] = { | ||
| charset match { | ||
| case Literal(str, DataTypes.StringType) | ||
| if str != null && str.toString.toLowerCase(Locale.ROOT) == "utf-8" => |
There was a problem hiding this comment.
This seems like a fragile check . Perhaps a better check would be to use StandardCharsets ?
| import org.apache.comet.serde.CommonStringExprs | ||
| import org.apache.comet.serde.ExprOuterClass.Expr | ||
|
|
||
| trait ShimCometExprs extends CommonStringExprs { |
There was a problem hiding this comment.
seems like the trait name seems to be different from Comet<> ?
| INSERT INTO test_encode_utf8 VALUES ('hello'), ('world'), (''), ('café'), (NULL) | ||
|
|
||
| query | ||
| SELECT encode(s, 'utf-8') FROM test_encode_utf8 |
There was a problem hiding this comment.
I believe spark accepts any variant of utf-8 (utf8 /UTF8 etc) . May be a good idea to add those tests ?
| BooleanType) => | ||
| val Seq(value, charset, _, _) = s.arguments | ||
| stringEncode(expr, charset, value, inputs, binding) | ||
|
|
There was a problem hiding this comment.
Seems like we missed removing this piece ?
Which issue does this PR close?
Closes #3183
Rationale for this change
Support expression Encode
What changes are included in this PR?
How are these changes tested?
Add encode.sql and run it in spark 3.4/3.5/4.0