Add native trace exporter#5690
Conversation
Introduce the `Datadog::Tracing::Transport::Native::TracerSpan` class backed by libdatadog's `ddog_TracerSpan` FFI. `TracerSpan._native_from_span(span)` reads instance variables directly from a `Datadog::Tracing::Span` (`@name`, `@service`, `@meta`, etc.) and constructs a Rust-heap-allocated span via the C FFI, including: - Scalar fields (name, service, resource, type, IDs, timestamps) - 128-bit trace ID split via `trace_id_t` struct into (low, high) 64-bit halves - String tags (meta) and numeric tags (metrics) via safe hash iteration that stashes errors instead of raising inside `rb_hash_foreach` This is the first part of the trace exporter C extension; `TraceExporter` and `Response` classes will follow.
Introduce the `Datadog::Tracing::Transport::Native::TraceExporter` class backed by libdatadog's `ddog_TraceExporter` FFI. `TraceExporter._native_new(url, tracer_version, language, ...)` builds a Rust `TraceExporter` through the config/build pattern: - Validates argument types upfront (before allocating Rust resources) - Creates a `ddog_TraceExporterConfig`, applies each setting via `SET_CONFIG` macro with cleanup-on-error - Builds the exporter and wraps it in Ruby `TypedData` The `TraceExporter` is freed via `ddog_trace_exporter_free` on GC. `Response` and `_native_send_traces` will follow.
Introduce `Datadog::Tracing::Transport::Native::Response` with the full set of predicate methods expected by the `Writer`: - `ok?`, `internal_error?`, `server_error?`, `client_error?`, `not_found?`, `unsupported?` - `trace_count` and `payload` (agent response body for sampling rate feedback) Two C helpers build responses from the native side: - `create_ok_response`: success with optional payload - `create_error_response`: classifies `ddog_TraceExporterErrorCode` into client/server/internal error categories These will be used by `_native_send_traces` in the next step.
|
✨ Fix all issues with BitsAI or with Cursor
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8e863833d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Implement `TraceExporter#_native_send_traces(traces)` which takes an `Array<Array<Span>>`, converts each span to Rust via the C FFI, groups them into trace chunks, and sends them to the Datadog Agent. Key safety patterns: - `rb_ensure` guarantees Rust-allocated trace chunks are freed if a Ruby exception fires during span conversion - `rb_thread_call_without_gvl2` releases the GVL during the blocking network send so other Ruby threads can run - The "gvl2" variant is used (not plain `gvl`) to prevent automatic interrupt raising before Rust response/error objects are inspected and freed - A retry loop handles the case where an interrupt causes `gvl2` to return before the send function actually runs The agent's response body (containing `rate_by_service` for sampling feedback) is extracted and surfaced via `Response#payload`. This completes the C extension for the native trace transport.
d8e8638 to
6a597db
Compare
ivoanjo
left a comment
There was a problem hiding this comment.
I did a pass on the C code! Didn't have time to go through the tests
| static inline void check_exporter_error(const char *context, | ||
| ddog_TraceExporterError *err) { | ||
| if (err == NULL) return; | ||
|
|
||
| char buf[MAX_RAISE_MESSAGE_SIZE]; | ||
| if (err->msg != NULL) { | ||
| snprintf(buf, sizeof(buf), "%s: %s", context, err->msg); | ||
| } else { | ||
| snprintf(buf, sizeof(buf), "%s: (unknown error)", context); | ||
| } | ||
| ddog_trace_exporter_error_free(err); | ||
| raise_error(rb_eRuntimeError, "%s", buf); | ||
| } |
There was a problem hiding this comment.
Minor: I'm curious why trace exporter doesn't use ddog_Error? We even already have nice helpers for dealing with these -- e.g. get_error_details_and_drop and read_ddogerr_string_and_drop
| const uint8_t *body_ptr = | ||
| ddog_trace_exporter_response_get_body(response, &body_len); |
There was a problem hiding this comment.
This looks like it could use a ddog_ByteSlice to avoid the awkward splitting between pointer and length.
| if (pending_exception) { | ||
| if (send_err != NULL) ddog_trace_exporter_error_free(send_err); | ||
| rb_jump_tag(pending_exception); | ||
| } | ||
|
|
||
| if (send_err != NULL) { | ||
| ddog_TraceExporterErrorCode code = send_err->code; | ||
| ddog_trace_exporter_error_free(send_err); |
There was a problem hiding this comment.
It looks like we only want the send_err for the code -- maybe doing that conversion inside send_chunks_without_gvl would simplify the logic here? (Nothing to free anymore...)
| VALUE err_resp = create_error_response(code, ctx->trace_count); | ||
| return rb_ary_new_from_args(1, err_resp); | ||
| } | ||
|
|
||
| VALUE ok_resp = create_ok_response(ctx->trace_count, payload); | ||
| return rb_ary_new_from_args(1, ok_resp); |
There was a problem hiding this comment.
Wait, is this correct? Shouldn't the result of this method be an array of responses (As documented in a comment below? Array[Response])
| if (push_err != NULL) { | ||
| ddog_trace_exporter_error_free(push_err); | ||
| } |
There was a problem hiding this comment.
Should this somehow signal back an issue?
| static void *send_chunks_without_gvl(void *data) { | ||
| send_chunks_args_t *a = (send_chunks_args_t *)data; | ||
| a->error = ddog_trace_exporter_send_trace_chunks( | ||
| a->exporter, a->chunks, a->response_out); | ||
| a->send_ran = true; | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Minor: Suggest using args? The single-letter-variable name that AI likes to use to me at least is super-weird to follow
Classes created with `rb_define_class_under` are attached to their parent module's constant table, making them reachable by GC. Marking them as GC roots with `rb_global_variable` is unnecessary.
Delegate to the existing helper instead of duplicating the `RSTRING_PTR`/`RSTRING_LEN` construction and type check inline.
Use a typed function pointer and a `static inline` function instead of a preprocessor macro. Same codegen, easier to debug and type-check.
Adopt the same pattern used by the profiling extension: a `process_pending_interruptions` callback paired with a `check_if_pending_exception` wrapper that returns the pending state as an integer instead of requiring manual `rb_protect` at the call site.
Replace the `ddog_TraceExporterResponse **response_out` double pointer with a `ddog_TraceExporterResponse *response` field and pass its address to the FFI call. This removes one level of indirection and keeps all send-related state within the args struct.
Use a descriptive variable name for the cast data pointer.
Move error handling into `send_chunks_without_gvl`: extract the error code enum and free the `ddog_TraceExporterError` before returning. After reacquiring the GVL only the plain `error_code` and `failed` flag remain, eliminating the need to free Rust-allocated error objects on every code path.
Accept keyword arguments via `rb_scan_args(argc, argv, "0:", ...)` instead of 9 positional VALUE parameters. This makes call sites self-documenting and prevents silent argument ordering mistakes. The Ruby call becomes: TraceExporter._native_new(url:, tracer_version:, language:, ...)
Define Response in Ruby with keyword-argument `initialize` and `attr_reader` accessors. The C extension loads the class at init time via `rb_require` and calls `Response.new(ok:, ...)` through `rb_funcallv_kw` instead of manually setting instance variables. This gives Ruby's JIT better visibility into the object shape and reduces the amount of C code.
Count entries skipped due to unexpected types during hash iteration and log a warning after the loop completes. Previously these entries were silently dropped, making it difficult to diagnose missing tags.
Replace two `rb_funcall` calls (`to_i` + `nsec`) with a single `rb_time_timespec` call that extracts the underlying `struct timespec` directly from the Ruby Time object. Zero method dispatch overhead.
Replace two `rb_funcall` calls (`&` and `>>`) with direct extraction via `rb_big_pack`, which copies the Bignum magnitude into a word buffer without method dispatch or temporary object allocation. A Fixnum fast path handles the common case of 64-bit trace IDs with a simple `FIX2LONG` cast.
Construct a `ddog_TracerSpanFields` struct with designated initializers and pass it by reference to `ddog_tracer_span_new`. This matches the updated libdatadog FFI signature and is self-documenting at the call site.
Hoist `span_count` before `ddog_tracer_trace_chunks_begin_chunk` and pass it as the capacity argument to pre-allocate the inner span vector.
The error path is unreachable in practice (the handle is always valid), but the return value must still be freed to avoid leaking the Rust-allocated error object.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5fa393cc9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /* 2. Convert scalars */ | ||
| ddog_CharSlice name_s = char_slice_from_ruby_string(rb_name); | ||
| ddog_CharSlice service_s = nullable_char_slice(rb_service); | ||
| ddog_CharSlice resource_s = char_slice_from_ruby_string(rb_resource); |
There was a problem hiding this comment.
Preserve nil resources when converting spans
If a span is created with resource: nil (the public Span initializer explicitly allows nil as a placeholder, and the existing Ruby serializer emits it), this conversion raises a TypeError before the batch can be sent because char_slice_from_ruby_string requires a String. The native exporter should treat @resource as nullable or otherwise mirror the current serialization path so valid spans do not fail export.
Useful? React with 👍 / 👎.
| # Implements the same predicate interface as the HTTP transport's | ||
| # response so callers can treat both uniformly. | ||
| class Response | ||
| attr_reader :trace_count, :payload |
There was a problem hiding this comment.
Expose service_rates on native responses
When these responses are wired into the tracing writer, the existing sampler callback in lib/datadog/tracing/component.rb calls response.service_rates for every non-internal response. Native responses only expose trace_count and payload, so a successful native send with the agent's rate_by_service body will raise NoMethodError in that callback instead of updating sampling rates; this response needs to match the trace response interface and provide parsed service rates.
Useful? React with 👍 / 👎.
| end | ||
|
|
||
| def ok? | ||
| @ok |
There was a problem hiding this comment.
does it make sense to cast to boolean here, just because based on the function name one might expect to receive a boolean?
| end | ||
|
|
||
| def ok? | ||
| @ok |
There was a problem hiding this comment.
same for other methods in this class
| resp.instance_variable_set(:@ok, ok) | ||
| resp.instance_variable_set(:@internal_error, internal_error) |
| VALUE rb_name = rb_ivar_get(span, at_name_id); | ||
| VALUE rb_service = rb_ivar_get(span, at_service_id); | ||
| VALUE rb_resource = rb_ivar_get(span, at_resource_id); | ||
| VALUE rb_type = rb_ivar_get(span, at_type_id); | ||
| VALUE rb_span_id = rb_ivar_get(span, at_id_id); | ||
| VALUE rb_parent_id = rb_ivar_get(span, at_parent_id_id); | ||
| VALUE rb_trace_id = rb_ivar_get(span, at_trace_id_id); | ||
| VALUE rb_status = rb_ivar_get(span, at_status_id); |
There was a problem hiding this comment.
From a performance standpoint rb_ivar_get() is slower than reading the ivar in Ruby and passing it to C, because Ruby gets an inline cache (caching the Shape) for reading/writing ivars, but C has no cache at all, so it needs to walk the Shape tree which is basically a singly-linked list from current Shape to root Shape so O(#ivars).
So another way could be to pass N arguments to this function and read these ivars from Ruby.
Might not matter too much here, I haven't measured.
There was a problem hiding this comment.
Ivo made a similar remark about it, I have attempted to do the change and pass something more structured, but it's a big-ish one so I've isolated it to its own task downstream of this PR.
| /* ---------------------------------------------------------------- | ||
| * Cache Ruby intern IDs | ||
| * ---------------------------------------------------------------- */ | ||
|
|
||
| /* Span ivars */ | ||
| at_name_id = rb_intern("@name"); | ||
| at_service_id = rb_intern("@service"); | ||
| at_resource_id = rb_intern("@resource"); | ||
| at_type_id = rb_intern("@type"); | ||
| at_id_id = rb_intern("@id"); | ||
| at_parent_id_id = rb_intern("@parent_id"); | ||
| at_trace_id_id = rb_intern("@trace_id"); | ||
| at_start_time_id = rb_intern("@start_time"); | ||
| at_duration_id = rb_intern("@duration"); | ||
| at_status_id = rb_intern("@status"); | ||
| at_meta_id = rb_intern("@meta"); | ||
| at_metrics_id = rb_intern("@metrics"); | ||
|
|
||
| /* Methods */ | ||
| id_duration_method = rb_intern("duration"); | ||
|
|
||
| /* Response.new */ | ||
| id_new = rb_intern("new"); |
There was a problem hiding this comment.
There is no need for this indirection because that's cached anyway, see https://github.com/ruby/ruby/blob/d6fa8e3e0f876114bf4ab9c8961a8e10e32ac9db/include/ruby/internal/symbol.h#L330-L341
There was a problem hiding this comment.
That was my thought as well, but I failed to find a source for it; thanks for confirming it with a reference.
# What does this PR do? Add span-building and trace-sending FFI functions to `libdd-data-pipeline-ffi`, enabling language tracers to construct spans field-by-field through the C API instead of passing pre-serialized msgpack. New types: - `TracerSpan`: opaque handle wrapping `Span<BytesData>` - `TracerTraceChunks`: opaque handle wrapping `Vec<Vec<SpanBytes>>` New functions: - `ddog_tracer_span_new`, `_free`, `_set_meta`, `_set_metric` - `ddog_tracer_trace_chunks_new`, `_free`, `_begin_chunk`, `_push_span` - `ddog_trace_exporter_send_trace_chunks` # Motivation The Ruby tracer needs a native trace export path that bypasses Ruby-side msgpack serialization. This provides the FFI surface for it; the corresponding C extension in `dd-trace-rb` calls these functions. Continuation of the prototype in #1661, reworked to stay close to `main`: the generic `TraceExporter<H>` and `SharedRuntime` are untouched; this is purely additive. # Additional Notes Spans are consumed on push, chunks are consumed on send — single-ownership enforced at the API level. `libdd-trace-utils` is promoted from dev-dependency to regular dependency since the FFI crate now needs `SpanBytes` at build time. Sibling PR for Ruby is at: DataDog/dd-trace-rb#5690 # How to test the change? `cargo test -p libdd-data-pipeline-ffi --lib` — 39 tests, all passing. The existing `trace_exporter` tests are unaffected. Co-authored-by: edmund.kump <edmund.kump@datadoghq.com>
What does this PR do?
Add the
Datadog::Tracing::Transport::NativeC extension module with three classes:TracerSpan: converts a RubySpanto a Rust-heap-allocatedddog_TracerSpanby reading instance variables directly (@name,@service,@meta, etc.)TraceExporter: wraps the Rustddog_TraceExporterwith config/build lifecycleResponse: ivar-backed response matching theWriter's expected interface (ok?,internal_error?,server_error?,client_error?,payload, etc.)TraceExporter#_native_send_traces(traces)takesArray<Array<Span>>, converts spans, groups them into trace chunks, and sends them to the Datadog Agent. GVL is released during the network call;rb_ensureguarantees Rust-allocated chunks are freed on exception.Motivation:
First step towards replacing the pure-Ruby msgpack +
Net::HTTPtrace export path with the Rust data pipeline. This provides the C extension; a Ruby transport class wiring it intoWriterwill follow.Companion to DataDog/libdatadog#1952 which adds the FFI surface this extension calls.
Change log entry
None (not yet wired into the tracer; no user-visible change).
Additional Notes:
Built on top of review feedback from the prototype PRs (#5422 / DataDog/libdatadog#1661):
trace_id_tstruct instead of two bareuint64_tout-paramshash_iter_ctxfor safe error stashing duringrb_hash_foreachrb_thread_call_without_gvl2(not plaingvl) to prevent interrupt-before-cleanupclient_error?/server_error?/internal_error?classification fromddog_TraceExporterErrorCodepayloadsurfaces the agent body for sampling rate feedbackAI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
48 RSpec examples under
spec/datadog/tracing/transport/native/:Requires
libdatadoggem built from the companion PR's branch (viagem "libdatadog", path: "../libdatadog-rb"in Gemfile).