Skip to content

Add native trace exporter#5690

Open
lloeki wants to merge 19 commits into
masterfrom
lloeki/native-trace-exporter
Open

Add native trace exporter#5690
lloeki wants to merge 19 commits into
masterfrom
lloeki/native-trace-exporter

Conversation

@lloeki
Copy link
Copy Markdown
Member

@lloeki lloeki commented May 6, 2026

What does this PR do?

Add the Datadog::Tracing::Transport::Native C extension module with three classes:

  • TracerSpan: converts a Ruby Span to a Rust-heap-allocated ddog_TracerSpan by reading instance variables directly (@name, @service, @meta, etc.)
  • TraceExporter: wraps the Rust ddog_TraceExporter with config/build lifecycle
  • Response: ivar-backed response matching the Writer's expected interface (ok?, internal_error?, server_error?, client_error?, payload, etc.)

TraceExporter#_native_send_traces(traces) takes Array<Array<Span>>, converts spans, groups them into trace chunks, and sends them to the Datadog Agent. GVL is released during the network call; rb_ensure guarantees Rust-allocated chunks are freed on exception.

Motivation:

First step towards replacing the pure-Ruby msgpack + Net::HTTP trace export path with the Rust data pipeline. This provides the C extension; a Ruby transport class wiring it into Writer will follow.

Companion to DataDog/libdatadog#1952 which adds the FFI surface this extension calls.

Change log entry

None (not yet wired into the tracer; no user-visible change).

Additional Notes:

Built on top of review feedback from the prototype PRs (#5422 / DataDog/libdatadog#1661):

  • trace_id_t struct instead of two bare uint64_t out-params
  • hash_iter_ctx for safe error stashing during rb_hash_foreach
  • rb_thread_call_without_gvl2 (not plain gvl) to prevent interrupt-before-cleanup
  • Proper client_error?/server_error?/internal_error? classification from ddog_TraceExporterErrorCode
  • Response payload surfaces the agent body for sampling rate feedback

AI was used to accelerate implementation; all code was reviewed and understood.

How to test the change?

48 RSpec examples under spec/datadog/tracing/transport/native/:

bundle exec rake clean compile
bundle exec rspec spec/datadog/tracing/transport/native/

Requires libdatadog gem built from the companion PR's branch (via gem "libdatadog", path: "../libdatadog-rb" in Gemfile).

lloeki added 3 commits May 6, 2026 18:24
Introduce the `Datadog::Tracing::Transport::Native::TracerSpan` class
backed by libdatadog's `ddog_TracerSpan` FFI.

`TracerSpan._native_from_span(span)` reads instance variables directly
from a `Datadog::Tracing::Span` (`@name`, `@service`, `@meta`, etc.)
and constructs a Rust-heap-allocated span via the C FFI, including:

- Scalar fields (name, service, resource, type, IDs, timestamps)
- 128-bit trace ID split via `trace_id_t` struct into (low, high)
  64-bit halves
- String tags (meta) and numeric tags (metrics) via safe hash
  iteration that stashes errors instead of raising inside
  `rb_hash_foreach`

This is the first part of the trace exporter C extension; `TraceExporter`
and `Response` classes will follow.
Introduce the `Datadog::Tracing::Transport::Native::TraceExporter`
class backed by libdatadog's `ddog_TraceExporter` FFI.

`TraceExporter._native_new(url, tracer_version, language, ...)` builds
a Rust `TraceExporter` through the config/build pattern:

- Validates argument types upfront (before allocating Rust resources)
- Creates a `ddog_TraceExporterConfig`, applies each setting via
  `SET_CONFIG` macro with cleanup-on-error
- Builds the exporter and wraps it in Ruby `TypedData`

The `TraceExporter` is freed via `ddog_trace_exporter_free` on GC.
`Response` and `_native_send_traces` will follow.
Introduce `Datadog::Tracing::Transport::Native::Response` with the
full set of predicate methods expected by the `Writer`:

- `ok?`, `internal_error?`, `server_error?`, `client_error?`,
  `not_found?`, `unsupported?`
- `trace_count` and `payload` (agent response body for sampling
  rate feedback)

Two C helpers build responses from the native side:

- `create_ok_response`: success with optional payload
- `create_error_response`: classifies `ddog_TraceExporterErrorCode`
  into client/server/internal error categories

These will be used by `_native_send_traces` in the next step.
@lloeki lloeki changed the title Lloeki/native trace exporter Add native trace exporter May 6, 2026
@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 Bot commented May 6, 2026

Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 37 Tests failed

Datadog::Tracing::Transport::Native::Response client error response  from rspec   View in Datadog   (Fix with Cursor)
Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.

Failure/Error:
  raise "Libdatadog does not seem to be available: #{Datadog::Core::LIBDATADOG_API_FAILURE}. " \
    "Try running \`bundle exec rake compile\` before running this test."

RuntimeError:
  Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.
./spec/support/core_helpers.rb:83:in \`skip_if_libdatadog_not_supported'
./spec/datadog/tracing/transport/native/response_spec.rb:7:in \`block (2 levels) in <top (required)>'
...
Datadog::Tracing::Transport::Native::Response internal error response  from rspec   View in Datadog   (Fix with Cursor)
Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.

Failure/Error:
  raise "Libdatadog does not seem to be available: #{Datadog::Core::LIBDATADOG_API_FAILURE}. " \
    "Try running \`bundle exec rake compile\` before running this test."

RuntimeError:
  Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.
./spec/support/core_helpers.rb:83:in \`skip_if_libdatadog_not_supported'
./spec/datadog/tracing/transport/native/response_spec.rb:7:in \`block (2 levels) in <top (required)>'
...
Datadog::Tracing::Transport::Native::Response nil payload  from rspec   View in Datadog   (Fix with Cursor)
Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.

Failure/Error:
  raise "Libdatadog does not seem to be available: #{Datadog::Core::LIBDATADOG_API_FAILURE}. " \
    "Try running \`bundle exec rake compile\` before running this test."

RuntimeError:
  Libdatadog does not seem to be available: LoadError: cannot load such file -- libdatadog_api.3.2_x86_64-linux. Try running \`bundle exec rake compile\` before running this test.
./spec/support/core_helpers.rb:83:in \`skip_if_libdatadog_not_supported'
./spec/datadog/tracing/transport/native/response_spec.rb:7:in \`block (2 levels) in <top (required)>'
...
View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d5fa393 | Docs | Datadog PR Page | Give us feedback!

@lloeki lloeki marked this pull request as ready for review May 7, 2026 08:21
@lloeki lloeki requested review from a team as code owners May 7, 2026 08:21
@lloeki lloeki requested a review from vpellan May 7, 2026 08:21
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8e863833d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment thread ext/libdatadog_api/trace_exporter.c
@lloeki lloeki marked this pull request as draft May 7, 2026 08:27
Implement `TraceExporter#_native_send_traces(traces)` which takes an
`Array<Array<Span>>`, converts each span to Rust via the C FFI,
groups them into trace chunks, and sends them to the Datadog Agent.

Key safety patterns:

- `rb_ensure` guarantees Rust-allocated trace chunks are freed if a
  Ruby exception fires during span conversion
- `rb_thread_call_without_gvl2` releases the GVL during the blocking
  network send so other Ruby threads can run
- The "gvl2" variant is used (not plain `gvl`) to prevent automatic
  interrupt raising before Rust response/error objects are inspected
  and freed
- A retry loop handles the case where an interrupt causes `gvl2` to
  return before the send function actually runs

The agent's response body (containing `rate_by_service` for sampling
feedback) is extracted and surfaced via `Response#payload`.

This completes the C extension for the native trace transport.
@lloeki lloeki force-pushed the lloeki/native-trace-exporter branch from d8e8638 to 6a597db Compare May 7, 2026 09:28
Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass on the C code! Didn't have time to go through the tests

Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment on lines +131 to +143
static inline void check_exporter_error(const char *context,
ddog_TraceExporterError *err) {
if (err == NULL) return;

char buf[MAX_RAISE_MESSAGE_SIZE];
if (err->msg != NULL) {
snprintf(buf, sizeof(buf), "%s: %s", context, err->msg);
} else {
snprintf(buf, sizeof(buf), "%s: (unknown error)", context);
}
ddog_trace_exporter_error_free(err);
raise_error(rb_eRuntimeError, "%s", buf);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'm curious why trace exporter doesn't use ddog_Error? We even already have nice helpers for dealing with these -- e.g. get_error_details_and_drop and read_ddogerr_string_and_drop

Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment thread ext/libdatadog_api/trace_exporter.c
Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment on lines +639 to +640
const uint8_t *body_ptr =
ddog_trace_exporter_response_get_body(response, &body_len);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could use a ddog_ByteSlice to avoid the awkward splitting between pointer and length.

Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment on lines +648 to +655
if (pending_exception) {
if (send_err != NULL) ddog_trace_exporter_error_free(send_err);
rb_jump_tag(pending_exception);
}

if (send_err != NULL) {
ddog_TraceExporterErrorCode code = send_err->code;
ddog_trace_exporter_error_free(send_err);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we only want the send_err for the code -- maybe doing that conversion inside send_chunks_without_gvl would simplify the logic here? (Nothing to free anymore...)

Comment thread ext/libdatadog_api/trace_exporter.c Outdated
Comment on lines +657 to +662
VALUE err_resp = create_error_response(code, ctx->trace_count);
return rb_ary_new_from_args(1, err_resp);
}

VALUE ok_resp = create_ok_response(ctx->trace_count, payload);
return rb_ary_new_from_args(1, ok_resp);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is this correct? Shouldn't the result of this method be an array of responses (As documented in a comment below? Array[Response])

Comment on lines +590 to +592
if (push_err != NULL) {
ddog_trace_exporter_error_free(push_err);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this somehow signal back an issue?

Comment on lines +524 to +530
static void *send_chunks_without_gvl(void *data) {
send_chunks_args_t *a = (send_chunks_args_t *)data;
a->error = ddog_trace_exporter_send_trace_chunks(
a->exporter, a->chunks, a->response_out);
a->send_ran = true;
return NULL;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Suggest using args? The single-letter-variable name that AI likes to use to me at least is super-weird to follow

lloeki added 12 commits May 7, 2026 15:14
Classes created with `rb_define_class_under` are attached to their
parent module's constant table, making them reachable by GC.
Marking them as GC roots with `rb_global_variable` is unnecessary.
Delegate to the existing helper instead of duplicating the
`RSTRING_PTR`/`RSTRING_LEN` construction and type check inline.
Use a typed function pointer and a `static inline` function instead of
a preprocessor macro. Same codegen, easier to debug and type-check.
Adopt the same pattern used by the profiling extension: a
`process_pending_interruptions` callback paired with a
`check_if_pending_exception` wrapper that returns the pending state
as an integer instead of requiring manual `rb_protect` at the call
site.
Replace the `ddog_TraceExporterResponse **response_out` double pointer
with a `ddog_TraceExporterResponse *response` field and pass its
address to the FFI call. This removes one level of indirection and
keeps all send-related state within the args struct.
Use a descriptive variable name for the cast data pointer.
Move error handling into `send_chunks_without_gvl`: extract the error
code enum and free the `ddog_TraceExporterError` before returning.
After reacquiring the GVL only the plain `error_code` and `failed`
flag remain, eliminating the need to free Rust-allocated error objects
on every code path.
Accept keyword arguments via `rb_scan_args(argc, argv, "0:", ...)`
instead of 9 positional VALUE parameters. This makes call sites
self-documenting and prevents silent argument ordering mistakes.

The Ruby call becomes:
  TraceExporter._native_new(url:, tracer_version:, language:, ...)
Define Response in Ruby with keyword-argument `initialize` and
`attr_reader` accessors. The C extension loads the class at init time
via `rb_require` and calls `Response.new(ok:, ...)` through
`rb_funcallv_kw` instead of manually setting instance variables.

This gives Ruby's JIT better visibility into the object shape and
reduces the amount of C code.
Count entries skipped due to unexpected types during hash iteration
and log a warning after the loop completes. Previously these entries
were silently dropped, making it difficult to diagnose missing tags.
Replace two `rb_funcall` calls (`to_i` + `nsec`) with a single
`rb_time_timespec` call that extracts the underlying `struct timespec`
directly from the Ruby Time object. Zero method dispatch overhead.
Replace two `rb_funcall` calls (`&` and `>>`) with direct extraction
via `rb_big_pack`, which copies the Bignum magnitude into a word
buffer without method dispatch or temporary object allocation.

A Fixnum fast path handles the common case of 64-bit trace IDs with
a simple `FIX2LONG` cast.
@dd-octo-sts dd-octo-sts Bot added the tracing label May 7, 2026
lloeki added 3 commits May 13, 2026 13:19
Construct a `ddog_TracerSpanFields` struct with designated initializers
and pass it by reference to `ddog_tracer_span_new`. This matches the
updated libdatadog FFI signature and is self-documenting at the call
site.
Hoist `span_count` before `ddog_tracer_trace_chunks_begin_chunk` and
pass it as the capacity argument to pre-allocate the inner span
vector.
The error path is unreachable in practice (the handle is always
valid), but the return value must still be freed to avoid leaking
the Rust-allocated error object.
@lloeki lloeki marked this pull request as ready for review May 26, 2026 09:27
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5fa393cc9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/* 2. Convert scalars */
ddog_CharSlice name_s = char_slice_from_ruby_string(rb_name);
ddog_CharSlice service_s = nullable_char_slice(rb_service);
ddog_CharSlice resource_s = char_slice_from_ruby_string(rb_resource);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve nil resources when converting spans

If a span is created with resource: nil (the public Span initializer explicitly allows nil as a placeholder, and the existing Ruby serializer emits it), this conversion raises a TypeError before the batch can be sent because char_slice_from_ruby_string requires a String. The native exporter should treat @resource as nullable or otherwise mirror the current serialization path so valid spans do not fail export.

Useful? React with 👍 / 👎.

# Implements the same predicate interface as the HTTP transport's
# response so callers can treat both uniformly.
class Response
attr_reader :trace_count, :payload
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose service_rates on native responses

When these responses are wired into the tracing writer, the existing sampler callback in lib/datadog/tracing/component.rb calls response.service_rates for every non-internal response. Native responses only expose trace_count and payload, so a successful native send with the agent's rate_by_service body will raise NoMethodError in that callback instead of updating sampling rates; this response needs to match the trace response interface and provide parsed service rates.

Useful? React with 👍 / 👎.

end

def ok?
@ok
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to cast to boolean here, just because based on the function name one might expect to receive a boolean?

end

def ok?
@ok
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for other methods in this class

Comment on lines +22 to +23
resp.instance_variable_set(:@ok, ok)
resp.instance_variable_set(:@internal_error, internal_error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the formatting is off here

Comment on lines +263 to +270
VALUE rb_name = rb_ivar_get(span, at_name_id);
VALUE rb_service = rb_ivar_get(span, at_service_id);
VALUE rb_resource = rb_ivar_get(span, at_resource_id);
VALUE rb_type = rb_ivar_get(span, at_type_id);
VALUE rb_span_id = rb_ivar_get(span, at_id_id);
VALUE rb_parent_id = rb_ivar_get(span, at_parent_id_id);
VALUE rb_trace_id = rb_ivar_get(span, at_trace_id_id);
VALUE rb_status = rb_ivar_get(span, at_status_id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance standpoint rb_ivar_get() is slower than reading the ivar in Ruby and passing it to C, because Ruby gets an inline cache (caching the Shape) for reading/writing ivars, but C has no cache at all, so it needs to walk the Shape tree which is basically a singly-linked list from current Shape to root Shape so O(#ivars).
So another way could be to pass N arguments to this function and read these ivars from Ruby.

Might not matter too much here, I haven't measured.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivo made a similar remark about it, I have attempted to do the change and pass something more structured, but it's a big-ish one so I've isolated it to its own task downstream of this PR.

Comment on lines +751 to +773
/* ----------------------------------------------------------------
* Cache Ruby intern IDs
* ---------------------------------------------------------------- */

/* Span ivars */
at_name_id = rb_intern("@name");
at_service_id = rb_intern("@service");
at_resource_id = rb_intern("@resource");
at_type_id = rb_intern("@type");
at_id_id = rb_intern("@id");
at_parent_id_id = rb_intern("@parent_id");
at_trace_id_id = rb_intern("@trace_id");
at_start_time_id = rb_intern("@start_time");
at_duration_id = rb_intern("@duration");
at_status_id = rb_intern("@status");
at_meta_id = rb_intern("@meta");
at_metrics_id = rb_intern("@metrics");

/* Methods */
id_duration_method = rb_intern("duration");

/* Response.new */
id_new = rb_intern("new");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thought as well, but I failed to find a source for it; thanks for confirming it with a reference.

gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/libdatadog that referenced this pull request May 26, 2026
# What does this PR do?

Add span-building and trace-sending FFI functions to `libdd-data-pipeline-ffi`, enabling language tracers to construct spans field-by-field through the C API instead of passing pre-serialized msgpack.

New types:
- `TracerSpan`: opaque handle wrapping `Span<BytesData>`
- `TracerTraceChunks`: opaque handle wrapping `Vec<Vec<SpanBytes>>`

New functions:
- `ddog_tracer_span_new`, `_free`, `_set_meta`, `_set_metric`
- `ddog_tracer_trace_chunks_new`, `_free`, `_begin_chunk`, `_push_span`
- `ddog_trace_exporter_send_trace_chunks`

# Motivation

The Ruby tracer needs a native trace export path that bypasses Ruby-side msgpack serialization. This provides the FFI surface for it; the corresponding C extension in `dd-trace-rb` calls these functions.

Continuation of the prototype in #1661, reworked to stay close to `main`: the generic `TraceExporter<H>` and `SharedRuntime` are untouched; this is purely additive.

# Additional Notes

Spans are consumed on push, chunks are consumed on send — single-ownership enforced at the API level.

`libdd-trace-utils` is promoted from dev-dependency to regular dependency since the FFI crate now needs `SpanBytes` at build time.

Sibling PR for Ruby is at: DataDog/dd-trace-rb#5690

# How to test the change?

`cargo test -p libdd-data-pipeline-ffi --lib` — 39 tests, all passing. The existing `trace_exporter` tests are unaffected.


Co-authored-by: edmund.kump <edmund.kump@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants