Add retroactive OTel execution lifecycle tracing#252
Conversation
e9fe6e0 to
059c47b
Compare
| def _ns(*, dt: datetime.datetime) -> int: | ||
| """Return *dt* as nanoseconds since the Unix epoch (required by OTel SDK). | ||
|
|
||
| Uses integer arithmetic on timedelta components to avoid float64 precision |
There was a problem hiding this comment.
I'm not sure timedelta.total_seconds induces loss of precision... Can you elaborate?
But not a blocker.
There was a problem hiding this comment.
:robot_face: timedelta.total_seconds() returns a float64. At current Unix timestamps (~1.75 × 10⁹ seconds), float64's precision at that magnitude is only ~238ns — adding a sub-second fractional component to a number that large loses anything finer than that. The integer decomposition avoids the float entirely, giving exact results to the microsecond.
Example: A timestamp of T + 1µs could round to T + 0ns or T + 476ns under the float path instead of the correct T + 1000ns, which would collapse two distinct status entries into the same nanosecond and produce a phantom zero-duration span.
Note from Morgan: Seeing a warning in Jaeger on a negative span duration is what lead to solving for precision lose to begin with.
:robot_face: That said, it's not a practical concern here — consecutive status history entries are at minimum milliseconds apart (the orchestrator sweeps every second), so 238ns of imprecision is seven orders of magnitude below the smallest real time difference we'd ever see. Simplifying to int(dt.timestamp() * 1_000_000_000) is fine.
There was a problem hiding this comment.
Re-verified and no warnings with the simpler solution.
|
|
||
| root.end(end_time=_ns(dt=last_time)) | ||
| except Exception: | ||
| _logger.warning( |
There was a problem hiding this comment.
It's OK to do _logger.exception
Emits a root 'execution' span and one 'execution.status' child span per status history entry when an ExecutionNode reaches a terminal state. All span timestamps are derived from the existing status history so durations reflect actual time spent, not when this code ran. - New module: cloud_pipelines_backend/instrumentation/execution_tracing.py - Hook: metrics._handle_before_commit calls try_emit_execution_trace - Orchestrator: otel.setup_providers() so the exporter is active - Tests: InMemorySpanExporter-backed suite in tests/instrumentation/
059c47b to
ac197e1
Compare

Summary
Emits a root
executionspan and oneexecution.statuschild span perstatus history entry when an
ExecutionNodereaches a terminal state. Allspan timestamps are derived from the existing status history so durations
reflect actual time spent, not when this code ran.
cloud_pipelines_backend/instrumentation/execution_tracing.pymetrics._handle_before_commitcallstry_emit_execution_traceotel.setup_providers()so the exporter is activeInMemorySpanExporter-backed suite intests/instrumentation/Screenshots