Skip to content

fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 28 commits into
mainfrom
worktree-brian.marks+adaptive-sampling-fixes
Jun 1, 2026
Merged

fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227
gh-worker-dd-mergequeue-cf854d[bot] merged 28 commits into
mainfrom
worktree-brian.marks+adaptive-sampling-fixes

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

What does this PR do?

Fixes adaptive-sampling Remote Config in datadog-opentelemetry, rebuilt on top of PR #154. Also adds DD_TRACE_SAMPLE_RATE support.

Before this PR, four things were broken:

  1. tracing_sampling_rate from RC was ignored. The handler only acted on tracing_sampling_rules; a rate-only payload installed nothing.
  2. RC list-shape tags were rejected. RC encodes tags as [{"key": "env", "value_glob": "prod"}], but libdd_sampling::SamplingRuleConfig::tags only accepted the map shape, so the parse errored and the whole update was dropped.
  3. Env DD_TRACE_SAMPLING_RULES were wiped on every RC update. update_sampling_rules_from_remote does a full override, so any RC change replaced env rules, even when RC only sent a global rate.
  4. DD_TRACE_SAMPLE_RATE had no effect. No binding existed.

Motivation

End-to-end adaptive sampling didn't work.

What changed

Composition. ApmTracingHandler::process_config now follows the multi-source precedence model:

env rules env DD_TRACE_SAMPLE_RATE RC tracing_sampling_rules RC tracing_sampling_rate Effective rule chain
present any absent / null absent / null env_rules
present unset absent / null present env_rules + catch_all(rc_rate)
present set absent / null absent / null env_rules + catch_all(env_rate)
present set absent / null present env_rules + catch_all(rc_rate)
any any non-empty array absent / null rc_rules + catch_all(env_rate) if env_rate set
any any non-empty array present rc_rules + catch_all(rc_rate)

The synthetic catch-all uses libdatadog's default provenance, mapping to DM -3 (LOCAL_USER). See the "legacy behavior" comment in test_trace_sampling_rules_override_rate.

DD_TRACE_SAMPLE_RATE. New Config::trace_sample_rate(): Option<f64>. When set, the sampler installs an implicit catch-all so DD_TRACE_RATE_LIMIT applies. Unset means no catch-all (libdatadog's no-rule path samples at 100%).

Tag normalization. RC encodes tags as the list shape [{"key", "value_glob"}]. This is parsed natively by libdd-sampling ≥ 2.1.0 (DataDog/libdatadog#2033), so this PR bumps libdd-sampling 1.0.0 → 2.1.0 (pulling libdd-common → 4.2.0) and no in-tracer normalization is needed. An earlier revision carried a normalize_rc_tags shim for this; it has been removed now that the upstream release is available. Regression coverage: test_handler_rc_rules_with_list_tags_applied (list-shape tags apply, tags preserved as a map) and test_handler_malformed_tags_rejects_update (malformed list entries still rejected wholesale, not silently broadened).

Fail-closed behavior. When libdatadog rejects an update (malformed tags, out-of-range rate), process_config returns Err so the RC dispatcher reports apply_state=3 and the prior policy survives. Out-of-range RC tracing_sampling_rate (outside [0.0, 1.0]) and non-numeric values are rejected up-front.

Env/code rate validation. DD_TRACE_SAMPLE_RATE (and the programmatic set_trace_sample_rate) get the same range check: only finite values in [0.0, 1.0] are honored; out-of-range values are logged and treated as unset rather than installed as a catch-all rule that libdatadog would clamp (negative ⇒ drop all, > 1.0 ⇒ keep all).

Target check. A config's service_target is honored before applying: a payload whose specific (non-*) service/env doesn't match this tracer — primary service or an advertised extra service, compared case-insensitively — is ignored, so a mistargeted RC delivery can never change this service's sampling. Mirrors dd-trace-py/go.

Additional Notes

@bm1549 bm1549 marked this pull request as ready for review May 15, 2026 18:17
@bm1549 bm1549 requested a review from a team as a code owner May 15, 2026 18:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 140f46d3ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread datadog-opentelemetry/src/core/configuration/remote_config.rs
@bm1549 bm1549 requested a review from genesor May 15, 2026 18:37
@bm1549 bm1549 marked this pull request as draft May 18, 2026 15:11
bm1549 and others added 7 commits May 21, 2026 10:15
Tracks presence vs. explicit null via `missing_field_and_null_value`,
mirroring the pattern already used for `tracing_sampling_rules`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the Remote Config payload contains tracing_sampling_rate (alone or
alongside tracing_sampling_rules), append a wildcard catch-all rule
with provenance=dynamic so libdd-sampling tags traces it samples as
REMOTE_DYNAMIC_TRACE_SAMPLING_RULE (-12) instead of LOCAL_USER (-3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remote Config encodes sampling rule tags as a list of
{key, value_glob} objects, but libdd-sampling's SamplingRuleConfig
only deserializes map-shape tags. Walk the RC payload and normalize
each rule's tags to map-shape before serializing for libdatadog.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review feedback: the previous wrap-as-Value / unwrap-via-unreachable!()
dance violated the CONTRIBUTING guideline against panics in non-test
code. Have normalize_rc_tags accept &mut [serde_json::Value] directly
and call it on the rules Vec in place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two correctness fixes surfaced by adversarial review:

1. normalize_rc_tags previously dropped malformed list entries silently.
   A rule with bad tags could lose constraints and become broader than
   intended (worst case: a 0% drop rule becomes a service-wide or
   wildcard drop). Now: if any list entry is malformed, leave the rule's
   tags in their original (rejected) list shape so libdatadog's parse
   rejects the entire update.

2. tracing_sampling_rate of a non-null, non-number type (e.g. string)
   previously fell through to None and, with any_field_present=true,
   silently cleared the active remote override. Now: reject with an
   anyhow::Error so the RC dispatcher reports apply_state=3 and the
   prior policy survives.

Regression tests cover both: prior override is preserved when a new
update is malformed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bm1549 bm1549 force-pushed the worktree-brian.marks+adaptive-sampling-fixes branch from eb525e4 to e159a46 Compare May 21, 2026 14:32
@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 21, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 699192f | Docs | Datadog PR Page | Give us feedback!

Copy link
Copy Markdown
Collaborator

@iunanua iunanua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I guess we can merge it before #222 because it will take a little while before we fully resolve it

BTW I have tested it with a real agent but I'm not able to see RC payloads when I enable adaptive sampling in the backend 🤷

bm1549 and others added 9 commits May 22, 2026 09:30
Adds Config::local_trace_sampling_rules() which returns the env/code/
default value of trace_sampling_rules, ignoring any Remote Config
override. The RC handler needs this to compose env rules with the
synthetic catch-all rule built from RC's tracing_sampling_rate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generated entry only — the Config field, accessor, and sampler wiring
land in the next commits. DD_TRACE_SAMPLE_RATE is the env-var
counterpart of the RC tracing_sampling_rate: a global rate applied
when no explicit rule matches a span.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Config::trace_sample_rate() returning the env-configured global
trace sample rate. Defaults to 1.0 when unset. The next commit wires
this into the sampler as a catch-all rule so it actually takes effect
on every span that doesn't match an explicit rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When DD_TRACE_SAMPLE_RATE is set (and is not the default 1.0), the
sampler now sees an implicit catch-all rule appended after the
locally-configured DD_TRACE_SAMPLING_RULES. The catch-all carries
"local" provenance so spans it samples get DM "-3"
(LOCAL_USER_TRACE_SAMPLING_RULE).

The same composition is used when Remote Config clears its override,
restoring the env-rate behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global RC tracing_sampling_rate is a local-user fallback in the
RC schema (DM -3), not a remote-dynamic rule (DM -12). The system-
tests parametric scenario asserts this with the "legacy behavior"
comment in test_trace_sampling_rules_override_rate.

Omit the provenance field from the synthetic catch-all JSON;
libdd-sampling's serde default fills it as "default", which maps
to LOCAL_USER_TRACE_SAMPLING_RULE (DM -3) per
libdd-sampling/src/datadog_sampler.rs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrites the rule/rate composition in ApmTracingHandler::process_config
to match the documented precedence model:

- RC explicit rules replace env rules (existing behavior, unchanged).
- RC rate-only preserves env rules: [env_rules..., catch_all(rc_rate)].
- The catch-all rate is rc_rate when present, otherwise the env-side
  DD_TRACE_SAMPLE_RATE if it's non-default.

Fixes test_trace_sampling_rate_with_sampling_rules,
test_trace_sampling_rules_override_rate, and
test_trace_sampling_rules_with_tags in the parametric scenario.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Align env-rate catch-all provenance to "default" (was "local"). Both
  values map to DM -3 in libdd-sampling today, but "default" is the
  documented libdatadog default (default_provenance()) and matches the
  provenance the RC-rate catch-all path produces via serde omission.
- Replace silent if-let fallthrough on serialized env_rules with a
  hard error. serde_json::to_value of a slice always produces
  Value::Array, but defending against a future serializer change with
  a silent no-op is fragile — fail loudly instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unset DD_TRACE_SAMPLE_RATE

Addresses three Codex adversarial-review findings:

- [HIGH] ApmTracingHandler::process_config previously swallowed errors
  from update_sampling_rules_from_remote, returning Ok(()) even when
  libdatadog rejected the payload. The RC dispatcher then recorded
  apply_state=2 (success) and cached the bad target. Propagate the
  error so apply_state=3 is reported and the target is not cached.

- [MEDIUM] Distinguish unset DD_TRACE_SAMPLE_RATE from explicit 1.0.
  Field type changes from ConfigItem<f64> (default 1.0) to
  ConfigItem<Option<f64>> (default None). The 1.0-EPSILON skip in
  effective_initial_rules is replaced with a plain is_some() check
  so explicit DD_TRACE_SAMPLE_RATE=1.0 installs a catch-all rule
  (and DD_TRACE_RATE_LIMIT then applies as expected).

- [MEDIUM] Reject tracing_sampling_rate values outside [0.0, 1.0]
  rather than appending them as catch-all rules with libdatadog
  silently clamping. Out-of-range now produces apply_state=3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…istry v2

The Feature Parity Dashboard registers DD_TRACE_SAMPLE_RATE under
multiple version keys per language. Version B (type: decimal,
default: null) matches dd-trace-rs's implementation: a decimal sample
rate with no default value (Option<f64>::None). The previous entry
used version A (type: null, default: "undefined"), which is the
slot used by older Node.js / Go implementations and caused the
validator to flag dd-trace-rs as missing a matching v2 record.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The tracer was only advertising APM_TRACING_SAMPLE_RULES (bit 29) in
its RC capabilities bitfield. Datadog's APM Sampling UI checks for
APM_TRACING_SAMPLE_RATE (bit 12) independently when deciding whether a
service is RC-capable for the rate-only configuration form. Without
bit 12 the UI shows "No remotely configurable tracer detected" for
this service even though the tracer's logic supports the rate-only
case end-to-end after PR #227's process_config rewrite.

Add bit 12 to ClientCapabilities::new(). The unit test now asserts
both bits are advertised.

This also unblocks system-tests
test_capability_tracing_sample_rules and
test_capability_tracing_sampling_rate
which had been left as missing_feature in manifests/rust.yml.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bm1549 added a commit to DataDog/libdatadog that referenced this pull request May 29, 2026
# What does this PR do?

Makes `libdd_sampling::SamplingRuleConfig::tags` accept two wire shapes:

- Map shape: `{"env": "prod"}` (current).
- Remote Config list shape: `[{"key": "env", "value_glob": "prod"}]`
(new).

Internally both normalize to `HashMap<String, String>`. List entries
missing `key` or `value_glob` produce a deserialization error. We don't
silently drop bad entries because doing so could broaden a
tag-constrained sampling rule.

# Motivation

Remote Config delivers `tracing_sampling_rules` entries with tags in the
list shape. The previous `#[serde(default)]` on `tags` only accepted the
map shape, so every tracer adopting `libdd-sampling` has to normalize at
the tracer ↔ libdatadog boundary. dd-trace-rs ships a
`normalize_rc_tags` helper for this. With this change, that workaround
is no longer needed in any tracer.

# Additional Notes

- No public API changes. The `tags` field type stays `HashMap<String,
String>`; only the deserializer is broader.
- The helper uses `serde::Deserializer::deserialize_any`, which is safe
for self-describing formats (JSON). Non-self-describing formats
(bincode/postcard) would fail at deserialization time with a clear serde
error.
- A follow-up PR in DataDog/dd-trace-rs (tracked alongside #227) removes
the `normalize_rc_tags` workaround once this lands in a libdatadog
release.

# How to test the change?

Four unit tests in `libdd-sampling/src/sampling_rule_config.rs`:

- `test_sampling_rule_config_tags_accepts_map_shape` (regression).
- `test_sampling_rule_config_tags_accepts_rc_list_shape`.
- `test_sampling_rule_config_tags_list_with_malformed_entry_rejects`.
- `test_sampling_rule_config_tags_absent_defaults_to_empty`.

Run: `cargo test -p libdd-sampling sampling_rule_config_tags`.

Integration verified locally against DataDog/dd-trace-rs#227: pointing
dd-trace-rs at this branch and removing its `normalize_rc_tags`
workaround keeps all 352 dd-trace-rs tests green, including the
handler-level tag tests that now traverse the upstream deserializer
end-to-end.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@bm1549 bm1549 marked this pull request as ready for review May 29, 2026 18:23
bm1549 and others added 8 commits May 29, 2026 14:23
libdd-sampling 2.1.0 deserializes Remote Config sampling-rule tags in
their wire shape (`[{key, value_glob}]`) natively, so the in-tracer
list->map normalization shim (`normalize_rc_tags`) added earlier in this
PR is no longer needed. Removes the shim, its single call site, and its
3 direct unit tests; bumps the workspace pin (pulls libdd-common
4.1.0 -> 4.2.0) and regenerates both lockfiles (root + instrumentation,
which depends on datadog-opentelemetry by path).

Behavior is unchanged from the shim version:
- test_handler_rc_rules_with_list_tags_applied: list-shape tags still
  apply with tags preserved as a map (now via libdd-sampling).
- test_handler_malformed_tags_rejects_update: the fail-closed invariant
  still holds — libdd-sampling 2.1.0 rejects a rule with a malformed
  list-shape tag entry rather than silently broadening it.

The major bump's breaking surface (owned `v04_span` fields) is behind a
feature the tracer does not enable (`default-features = false`), so it
does not reach this crate. MSRV unchanged (2.1.0 + libdd-common 4.2.0
both require Rust 1.87.0, matching the workspace).

cargo test --workspace --lib: 314 passed. clippy + nightly fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A finite but out-of-range DD_TRACE_SAMPLE_RATE (e.g. -0.5 or 1.5) was
stored verbatim and installed as a catch-all sampling rule. libdd-sampling
clamps the decision rate internally, so a negative value silently dropped
all unmatched traffic and a value > 1.0 kept all of it, while still
enabling the rate limiter — worse than ignoring the bad config.

Adds `validate_trace_sample_rate`: only finite values in [0.0, 1.0] are
kept; anything else is logged and treated as unset (no catch-all). Applied
to both the env-var ingestion path and the programmatic `set_trace_sample_rate`
setter, mirroring the range check already enforced on RC's
`tracing_sampling_rate`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ApmTracingConfig dropped the payload's `service_target`, so process_config
applied any delivered config unconditionally. A stale or mistargeted RC
payload could install another service's or env's sampling policy on this
tracer.

Deserializes `service_target` and, before mutating any sampler state,
skips the config when a specific (non-`*`) service or env does not match
this tracer's `config.service()` / `config.env()`. A wildcard or absent
target still applies. The backend RC predicate already scopes delivery;
this is defense-in-depth, matching dd-trace-py and dd-trace-go.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`Baggage::get` takes `impl Into<Key>`, so `&Key::new(key)` triggers
clippy::needless_borrows_for_generic_args on rust >= 1.95. Pass the owned
key directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first service_target guard compared only the primary service with
exact case. Because dd-trace-rs advertises extra_services to the backend,
a config legitimately scoped to an advertised extra service — or one whose
service/env differs only by case (the UI warns case can differ) — would be
skipped and silently acknowledged, leaving sampling unchanged.

Make the guard conservative so it only skips definitely-foreign configs:
- service matches the primary service OR any advertised extra service
  (new read-only `ExtraServicesTracker::contains_service`, no queue drain)
- service and env are compared case-insensitively

Adds tests for the case-only difference and the extra-service cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
libdd-sampling 2.1.0 / libdd-common 4.2.0 dropped the rustls TLS stack
from the dependency tree, so those entries (hyper-rustls, ring, rustls*,
schannel, security-framework*, tokio-rustls, untrusted, zeroize, etc.) are
no longer present. Regenerated with dd-rust-license-tool 1.0.6 (the version
the Lint workflow uses) so the "Valid LICENSE-3rdparty.csv" check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
for key in expected_keys {
assert!(
baggage.get(&Key::new(key)).is_some(),
baggage.get(Key::new(key)).is_some(),
Copy link
Copy Markdown
Contributor Author

@bm1549 bm1549 Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated clippy fix on Rust 1.95.0 for clippy::needless_borrows_for_generic_args

This repo writes CHANGELOG sections at release time (in the version-bump
commit), not per feature PR; there is no running "Unreleased" section.
Remove the entries added earlier in this PR so they're captured in the
repo's usual style when the next release is cut.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread datadog-opentelemetry/src/core/configuration/configuration.rs Outdated
Comment thread Cargo.toml Outdated
bm1549 and others added 2 commits June 1, 2026 11:21
Co-authored-by: Igor Unanua <igor.unanua@datadoghq.com>
Addresses review feedback (@iunanua): the comment claimed the catch-all
is skipped for "the default of 1.0", but DD_TRACE_SAMPLE_RATE's default
is now None (unset), and an explicit 1.0 *does* install the catch-all so
DD_TRACE_RATE_LIMIT applies (see test_explicit_dd_trace_sample_rate_one_
installs_catch_all). Rewrite the comment to match: append only when the
rate is explicitly set; unset => no catch-all (libdatadog's 100% fallback).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit bbecfd5 into main Jun 1, 2026
35 of 36 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the worktree-brian.marks+adaptive-sampling-fixes branch June 1, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants