feat: add from_utc_timestamp and to_utc_timestamp expressions#4308
feat: add from_utc_timestamp and to_utc_timestamp expressions#4308andygrove wants to merge 3 commits into
Conversation
Wires Spark FromUTCTimestamp to the upstream datafusion-spark SparkFromUtcTimestamp UDF. Adds a Scala serde with a documented incompatibility note for legacy timezone strings (GMT+1, UTC+1, PST) that arrow's Tz parser does not accept, a SQL file test covering IANA names, fixed offsets, both DST branches, and null handling under a session-timezone ConfigMatrix, and updates the expression support doc with dated audit notes for Spark 3.4.3, 3.5.8, and 4.0.1.
Wires Spark ToUTCTimestamp to the upstream datafusion-spark SparkToUtcTimestamp UDF. Shares the legacy-timezone-form incompatibility note with from_utc_timestamp via a private helper object. Mirrors the SQL file test for from_utc_timestamp covering IANA names, fixed offsets, both DST branches, nulls, and a session-timezone ConfigMatrix.
|
|
||
| object CometFromUTCTimestamp extends CometExpressionSerde[FromUTCTimestamp] { | ||
|
|
||
| override def getIncompatibleReasons(): Seq[String] = |
There was a problem hiding this comment.
Do we need to say getSupportLevel Incompatible?
There was a problem hiding this comment.
Yes, we do. Thanks for catching that.
There was a problem hiding this comment.
Done in 4afb755. Both serdes now override getSupportLevel to return Incompatible(Some(...)) with the same reason that goes to the Compatibility Guide. Updated both SQL tests to set spark.comet.expression.{From,To}UTCTimestamp.allowIncompatible=true so they still exercise the native path; both ConfigMatrix runs pass locally.
| -- value, so it must not depend on the session timezone. Verify across two | ||
| -- representative session zones. | ||
| -- ConfigMatrix: spark.sql.session.timeZone=UTC,America/Los_Angeles | ||
|
|
There was a problem hiding this comment.
Do we need to add allow incompatible here?
There was a problem hiding this comment.
Added in 4afb755 alongside the getSupportLevel change. Without the per-expression allowIncompatible flag the SQL tests would fall back to the JVM path now that the serde reports Incompatible.
The serdes documented the timezone-parser divergence via getIncompatibleReasons() but left getSupportLevel at the default Compatible(), so users got the native path silently rather than opting in via spark.comet.expr.allowIncompatible. Override getSupportLevel to return Incompatible(...) using the same reason, and set the per-expr allowIncompatible flag in the SQL tests so they still exercise the native path.
Which issue does this PR close?
Closes #2013.
Rationale for this change
from_utc_timestampandto_utc_timestampare commonly used Spark datetime functions. Both extend the sameUTCTimestampCatalyst trait and have matchingSparkFromUtcTimestamp/SparkToUtcTimestampimplementations in the upstreamdatafusion-sparkcrate, so wiring them through gives Comet acceleration with minimal native code.What changes are included in this PR?
CometFromUTCTimestampandCometToUTCTimestampserdes inspark/src/main/scala/org/apache/comet/serde/datetime.scalaand register them inQueryPlanSerde'stemporalExpressionsmap. The shared incompat reason lives on a privateUTCTimestampSerdehelper.SparkFromUtcTimestampandSparkToUtcTimestampUDFs innative/core/src/execution/jni_api.rs::register_datafusion_spark_function.getIncompatibleReasonson both serdes to document the one known divergence: arrow'sTzparser does not accept Spark's legacy timezone forms (GMT+1,UTC+1,PSTand similar). Such timezones surface a native parse error rather than a silent wrong result. IANA names and fixed offsets (+HH:MM) are fully supported.spark/src/test/resources/sql-tests/expressions/datetime/from_utc_timestamp.sqlandto_utc_timestamp.sql. Both cover column and literal arguments for the timestamp and timezone operands, IANA names, fixed offsets, summer and winter DST rows forAmerica/Los_Angeles, and null handling, under aConfigMatrix: spark.sql.session.timeZone=UTC,America/Los_Angelesto verify the result is independent of session timezone.docs/source/contributor-guide/spark_expressions_support.mdto mark both functions supported and record dated audit notes for Spark 3.4.3, 3.5.8, and 4.0.1.Both expressions were scaffolded with the
implement-comet-expressionproject skill, which also drove theaudit-comet-expressionfollow-up that produced the test matrix above.How are these changes tested?
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite from_utc_timestamp" -Dtest=none(passes bothConfigMatrixruns locally)../mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite to_utc_timestamp" -Dtest=none(passes bothConfigMatrixruns locally).cd native && cargo clippy --all-targets --workspace -- -D warningspasses clean.