Skip to content

feat: support stateful CometUDFs#4345

Merged
mbutrovich merged 1 commit into
apache:mainfrom
mbutrovich:cometudf_changes
May 15, 2026
Merged

feat: support stateful CometUDFs#4345
mbutrovich merged 1 commit into
apache:mainfrom
mbutrovich:cometudf_changes

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Peeled off from #4267 to keep that PR scoped to codegen. The cache-shape change is independent of any consumer and benefits the codegen dispatcher (#4267), the regex CometUDF (#4239), and the JSON CometUDF (#4305) equally.

The current process-wide ConcurrentHashMap<String, CometUDF> requires every CometUDF to be strictly stateless: one shared instance services all tasks. A thread-local cache would not help because Tokio work-stealing on the scan-free execution path can move a Spark task's future between workers across batches, losing per-batch state. Keying by Spark task attempt ID gives continuity within a task and isolation across tasks regardless of which worker is polling.

What changes are included in this PR?

  • CometUdfBridge.INSTANCES becomes ConcurrentHashMap<Long, ConcurrentHashMap<String, CometUDF>> keyed by (taskAttemptId, className).
  • A TaskCompletionListener registered on the first cache miss for a task evicts the per-task entry on task end.
  • NO_TASK_ID = -1L sentinel covers calls without a TaskContext (unit tests, direct native driver runs); that bucket is not evicted because no task-completion event fires.
  • CometUDF Scaladoc updates the contract to "may hold per-task state in fields" and documents the single-threaded-per-instance invariant (Spark runs one native future per partition, Tokio polls one future per worker at a time).
  • Defensive assertions on evaluate preconditions, the post-install TaskContext invariant, and the cache-side invariants (single listener registration, non-null cache, reflective-instantiate success).

How are these changes tested?

No new tests in this PR for the same reason as #4306: the Arrow shading boundary in common/ blocks unit tests that subclass CometUDF. End-to-end coverage lands with each consumer (#4267, #4239, #4305) when it drives the bridge.

@mbutrovich mbutrovich changed the title feat: support stateful CometUDFs feat: support stateful CometUDFs via task ID map May 15, 2026
@mbutrovich mbutrovich changed the title feat: support stateful CometUDFs via task ID map feat: support stateful CometUDFs May 15, 2026
@mbutrovich mbutrovich requested a review from andygrove May 15, 2026 14:15
@mbutrovich mbutrovich self-assigned this May 15, 2026
@mbutrovich mbutrovich merged commit 9c76e87 into apache:main May 15, 2026
126 of 127 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants