Skip to content

Add retroactive OTel execution lifecycle tracing#252

Merged
morgan-wowk merged 1 commit into
masterfrom
execution-tracing-core
May 27, 2026
Merged

Add retroactive OTel execution lifecycle tracing#252
morgan-wowk merged 1 commit into
masterfrom
execution-tracing-core

Conversation

@morgan-wowk
Copy link
Copy Markdown
Collaborator

@morgan-wowk morgan-wowk commented May 22, 2026

Summary

Emits a root execution span and one execution.status child span per
status history entry when an ExecutionNode reaches a terminal state. All
span timestamps are derived from the existing status history so durations
reflect actual time spent, not when this code ran.

  • New module: cloud_pipelines_backend/instrumentation/execution_tracing.py
  • Hook: metrics._handle_before_commit calls try_emit_execution_trace
  • Orchestrator: otel.setup_providers() so the exporter is active
  • Tests: InMemorySpanExporter-backed suite in tests/instrumentation/

Screenshots

Screenshot 2026-05-22 at 8.09.01 PM.png

Screenshot 2026-05-22 at 8.13.33 PM.png

Screenshot 2026-05-22 at 8.14.49 PM.png

def _ns(*, dt: datetime.datetime) -> int:
"""Return *dt* as nanoseconds since the Unix epoch (required by OTel SDK).

Uses integer arithmetic on timedelta components to avoid float64 precision
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure timedelta.total_seconds induces loss of precision... Can you elaborate?

But not a blocker.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:robot_face: timedelta.total_seconds() returns a float64. At current Unix timestamps (~1.75 × 10⁹ seconds), float64's precision at that magnitude is only ~238ns — adding a sub-second fractional component to a number that large loses anything finer than that. The integer decomposition avoids the float entirely, giving exact results to the microsecond.

Example: A timestamp of T + 1µs could round to T + 0ns or T + 476ns under the float path instead of the correct T + 1000ns, which would collapse two distinct status entries into the same nanosecond and produce a phantom zero-duration span.

Note from Morgan: Seeing a warning in Jaeger on a negative span duration is what lead to solving for precision lose to begin with.

:robot_face: That said, it's not a practical concern here — consecutive status history entries are at minimum milliseconds apart (the orchestrator sweeps every second), so 238ns of imprecision is seven orders of magnitude below the smallest real time difference we'd ever see. Simplifying to int(dt.timestamp() * 1_000_000_000) is fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-verified and no warnings with the simpler solution.


root.end(end_time=_ns(dt=last_time))
except Exception:
_logger.warning(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to do _logger.exception

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Emits a root 'execution' span and one 'execution.status' child span per
status history entry when an ExecutionNode reaches a terminal state. All
span timestamps are derived from the existing status history so durations
reflect actual time spent, not when this code ran.

- New module: cloud_pipelines_backend/instrumentation/execution_tracing.py
- Hook: metrics._handle_before_commit calls try_emit_execution_trace
- Orchestrator: otel.setup_providers() so the exporter is active
- Tests: InMemorySpanExporter-backed suite in tests/instrumentation/
@morgan-wowk morgan-wowk force-pushed the execution-tracing-core branch from 059c47b to ac197e1 Compare May 26, 2026 23:16
@morgan-wowk morgan-wowk merged commit fa70d6d into master May 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants