fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 140f46d3ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Tracks presence vs. explicit null via `missing_field_and_null_value`, mirroring the pattern already used for `tracing_sampling_rules`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the Remote Config payload contains tracing_sampling_rate (alone or alongside tracing_sampling_rules), append a wildcard catch-all rule with provenance=dynamic so libdd-sampling tags traces it samples as REMOTE_DYNAMIC_TRACE_SAMPLING_RULE (-12) instead of LOCAL_USER (-3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remote Config encodes sampling rule tags as a list of
{key, value_glob} objects, but libdd-sampling's SamplingRuleConfig
only deserializes map-shape tags. Walk the RC payload and normalize
each rule's tags to map-shape before serializing for libdatadog.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review feedback: the previous wrap-as-Value / unwrap-via-unreachable!() dance violated the CONTRIBUTING guideline against panics in non-test code. Have normalize_rc_tags accept &mut [serde_json::Value] directly and call it on the rules Vec in place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two correctness fixes surfaced by adversarial review: 1. normalize_rc_tags previously dropped malformed list entries silently. A rule with bad tags could lose constraints and become broader than intended (worst case: a 0% drop rule becomes a service-wide or wildcard drop). Now: if any list entry is malformed, leave the rule's tags in their original (rejected) list shape so libdatadog's parse rejects the entire update. 2. tracing_sampling_rate of a non-null, non-number type (e.g. string) previously fell through to None and, with any_field_present=true, silently cleared the active remote override. Now: reject with an anyhow::Error so the RC dispatcher reports apply_state=3 and the prior policy survives. Regression tests cover both: prior override is preserved when a new update is malformed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
eb525e4 to
e159a46
Compare
🎉 All green!🧪 All tests passed 🔗 Commit SHA: 699192f | Docs | Datadog PR Page | Give us feedback! |
iunanua
left a comment
There was a problem hiding this comment.
LGTM
I guess we can merge it before #222 because it will take a little while before we fully resolve it
BTW I have tested it with a real agent but I'm not able to see RC payloads when I enable adaptive sampling in the backend 🤷
Adds Config::local_trace_sampling_rules() which returns the env/code/ default value of trace_sampling_rules, ignoring any Remote Config override. The RC handler needs this to compose env rules with the synthetic catch-all rule built from RC's tracing_sampling_rate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generated entry only — the Config field, accessor, and sampler wiring land in the next commits. DD_TRACE_SAMPLE_RATE is the env-var counterpart of the RC tracing_sampling_rate: a global rate applied when no explicit rule matches a span. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Config::trace_sample_rate() returning the env-configured global trace sample rate. Defaults to 1.0 when unset. The next commit wires this into the sampler as a catch-all rule so it actually takes effect on every span that doesn't match an explicit rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When DD_TRACE_SAMPLE_RATE is set (and is not the default 1.0), the sampler now sees an implicit catch-all rule appended after the locally-configured DD_TRACE_SAMPLING_RULES. The catch-all carries "local" provenance so spans it samples get DM "-3" (LOCAL_USER_TRACE_SAMPLING_RULE). The same composition is used when Remote Config clears its override, restoring the env-rate behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global RC tracing_sampling_rate is a local-user fallback in the RC schema (DM -3), not a remote-dynamic rule (DM -12). The system- tests parametric scenario asserts this with the "legacy behavior" comment in test_trace_sampling_rules_override_rate. Omit the provenance field from the synthetic catch-all JSON; libdd-sampling's serde default fills it as "default", which maps to LOCAL_USER_TRACE_SAMPLING_RULE (DM -3) per libdd-sampling/src/datadog_sampler.rs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrites the rule/rate composition in ApmTracingHandler::process_config to match the documented precedence model: - RC explicit rules replace env rules (existing behavior, unchanged). - RC rate-only preserves env rules: [env_rules..., catch_all(rc_rate)]. - The catch-all rate is rc_rate when present, otherwise the env-side DD_TRACE_SAMPLE_RATE if it's non-default. Fixes test_trace_sampling_rate_with_sampling_rules, test_trace_sampling_rules_override_rate, and test_trace_sampling_rules_with_tags in the parametric scenario. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Align env-rate catch-all provenance to "default" (was "local"). Both values map to DM -3 in libdd-sampling today, but "default" is the documented libdatadog default (default_provenance()) and matches the provenance the RC-rate catch-all path produces via serde omission. - Replace silent if-let fallthrough on serialized env_rules with a hard error. serde_json::to_value of a slice always produces Value::Array, but defending against a future serializer change with a silent no-op is fragile — fail loudly instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unset DD_TRACE_SAMPLE_RATE Addresses three Codex adversarial-review findings: - [HIGH] ApmTracingHandler::process_config previously swallowed errors from update_sampling_rules_from_remote, returning Ok(()) even when libdatadog rejected the payload. The RC dispatcher then recorded apply_state=2 (success) and cached the bad target. Propagate the error so apply_state=3 is reported and the target is not cached. - [MEDIUM] Distinguish unset DD_TRACE_SAMPLE_RATE from explicit 1.0. Field type changes from ConfigItem<f64> (default 1.0) to ConfigItem<Option<f64>> (default None). The 1.0-EPSILON skip in effective_initial_rules is replaced with a plain is_some() check so explicit DD_TRACE_SAMPLE_RATE=1.0 installs a catch-all rule (and DD_TRACE_RATE_LIMIT then applies as expected). - [MEDIUM] Reject tracing_sampling_rate values outside [0.0, 1.0] rather than appending them as catch-all rules with libdatadog silently clamping. Out-of-range now produces apply_state=3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…istry v2 The Feature Parity Dashboard registers DD_TRACE_SAMPLE_RATE under multiple version keys per language. Version B (type: decimal, default: null) matches dd-trace-rs's implementation: a decimal sample rate with no default value (Option<f64>::None). The previous entry used version A (type: null, default: "undefined"), which is the slot used by older Node.js / Go implementations and caused the validator to flag dd-trace-rs as missing a matching v2 record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The tracer was only advertising APM_TRACING_SAMPLE_RULES (bit 29) in its RC capabilities bitfield. Datadog's APM Sampling UI checks for APM_TRACING_SAMPLE_RATE (bit 12) independently when deciding whether a service is RC-capable for the rate-only configuration form. Without bit 12 the UI shows "No remotely configurable tracer detected" for this service even though the tracer's logic supports the rate-only case end-to-end after PR #227's process_config rewrite. Add bit 12 to ClientCapabilities::new(). The unit test now asserts both bits are advertised. This also unblocks system-tests test_capability_tracing_sample_rules and test_capability_tracing_sampling_rate which had been left as missing_feature in manifests/rust.yml. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# What does this PR do?
Makes `libdd_sampling::SamplingRuleConfig::tags` accept two wire shapes:
- Map shape: `{"env": "prod"}` (current).
- Remote Config list shape: `[{"key": "env", "value_glob": "prod"}]`
(new).
Internally both normalize to `HashMap<String, String>`. List entries
missing `key` or `value_glob` produce a deserialization error. We don't
silently drop bad entries because doing so could broaden a
tag-constrained sampling rule.
# Motivation
Remote Config delivers `tracing_sampling_rules` entries with tags in the
list shape. The previous `#[serde(default)]` on `tags` only accepted the
map shape, so every tracer adopting `libdd-sampling` has to normalize at
the tracer ↔ libdatadog boundary. dd-trace-rs ships a
`normalize_rc_tags` helper for this. With this change, that workaround
is no longer needed in any tracer.
# Additional Notes
- No public API changes. The `tags` field type stays `HashMap<String,
String>`; only the deserializer is broader.
- The helper uses `serde::Deserializer::deserialize_any`, which is safe
for self-describing formats (JSON). Non-self-describing formats
(bincode/postcard) would fail at deserialization time with a clear serde
error.
- A follow-up PR in DataDog/dd-trace-rs (tracked alongside #227) removes
the `normalize_rc_tags` workaround once this lands in a libdatadog
release.
# How to test the change?
Four unit tests in `libdd-sampling/src/sampling_rule_config.rs`:
- `test_sampling_rule_config_tags_accepts_map_shape` (regression).
- `test_sampling_rule_config_tags_accepts_rc_list_shape`.
- `test_sampling_rule_config_tags_list_with_malformed_entry_rejects`.
- `test_sampling_rule_config_tags_absent_defaults_to_empty`.
Run: `cargo test -p libdd-sampling sampling_rule_config_tags`.
Integration verified locally against DataDog/dd-trace-rs#227: pointing
dd-trace-rs at this branch and removing its `normalize_rc_tags`
workaround keeps all 352 dd-trace-rs tests green, including the
handler-level tag tests that now traverse the upstream deserializer
end-to-end.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
libdd-sampling 2.1.0 deserializes Remote Config sampling-rule tags in
their wire shape (`[{key, value_glob}]`) natively, so the in-tracer
list->map normalization shim (`normalize_rc_tags`) added earlier in this
PR is no longer needed. Removes the shim, its single call site, and its
3 direct unit tests; bumps the workspace pin (pulls libdd-common
4.1.0 -> 4.2.0) and regenerates both lockfiles (root + instrumentation,
which depends on datadog-opentelemetry by path).
Behavior is unchanged from the shim version:
- test_handler_rc_rules_with_list_tags_applied: list-shape tags still
apply with tags preserved as a map (now via libdd-sampling).
- test_handler_malformed_tags_rejects_update: the fail-closed invariant
still holds — libdd-sampling 2.1.0 rejects a rule with a malformed
list-shape tag entry rather than silently broadening it.
The major bump's breaking surface (owned `v04_span` fields) is behind a
feature the tracer does not enable (`default-features = false`), so it
does not reach this crate. MSRV unchanged (2.1.0 + libdd-common 4.2.0
both require Rust 1.87.0, matching the workspace).
cargo test --workspace --lib: 314 passed. clippy + nightly fmt clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A finite but out-of-range DD_TRACE_SAMPLE_RATE (e.g. -0.5 or 1.5) was stored verbatim and installed as a catch-all sampling rule. libdd-sampling clamps the decision rate internally, so a negative value silently dropped all unmatched traffic and a value > 1.0 kept all of it, while still enabling the rate limiter — worse than ignoring the bad config. Adds `validate_trace_sample_rate`: only finite values in [0.0, 1.0] are kept; anything else is logged and treated as unset (no catch-all). Applied to both the env-var ingestion path and the programmatic `set_trace_sample_rate` setter, mirroring the range check already enforced on RC's `tracing_sampling_rate`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ApmTracingConfig dropped the payload's `service_target`, so process_config applied any delivered config unconditionally. A stale or mistargeted RC payload could install another service's or env's sampling policy on this tracer. Deserializes `service_target` and, before mutating any sampler state, skips the config when a specific (non-`*`) service or env does not match this tracer's `config.service()` / `config.env()`. A wildcard or absent target still applies. The backend RC predicate already scopes delivery; this is defense-in-depth, matching dd-trace-py and dd-trace-go. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`Baggage::get` takes `impl Into<Key>`, so `&Key::new(key)` triggers clippy::needless_borrows_for_generic_args on rust >= 1.95. Pass the owned key directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first service_target guard compared only the primary service with exact case. Because dd-trace-rs advertises extra_services to the backend, a config legitimately scoped to an advertised extra service — or one whose service/env differs only by case (the UI warns case can differ) — would be skipped and silently acknowledged, leaving sampling unchanged. Make the guard conservative so it only skips definitely-foreign configs: - service matches the primary service OR any advertised extra service (new read-only `ExtraServicesTracker::contains_service`, no queue drain) - service and env are compared case-insensitively Adds tests for the case-only difference and the extra-service cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
libdd-sampling 2.1.0 / libdd-common 4.2.0 dropped the rustls TLS stack from the dependency tree, so those entries (hyper-rustls, ring, rustls*, schannel, security-framework*, tokio-rustls, untrusted, zeroize, etc.) are no longer present. Regenerated with dd-rust-license-tool 1.0.6 (the version the Lint workflow uses) so the "Valid LICENSE-3rdparty.csv" check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| for key in expected_keys { | ||
| assert!( | ||
| baggage.get(&Key::new(key)).is_some(), | ||
| baggage.get(Key::new(key)).is_some(), |
There was a problem hiding this comment.
This is an unrelated clippy fix on Rust 1.95.0 for clippy::needless_borrows_for_generic_args
This repo writes CHANGELOG sections at release time (in the version-bump commit), not per feature PR; there is no running "Unreleased" section. Remove the entries added earlier in this PR so they're captured in the repo's usual style when the next release is cut. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Igor Unanua <igor.unanua@datadoghq.com>
Addresses review feedback (@iunanua): the comment claimed the catch-all is skipped for "the default of 1.0", but DD_TRACE_SAMPLE_RATE's default is now None (unset), and an explicit 1.0 *does* install the catch-all so DD_TRACE_RATE_LIMIT applies (see test_explicit_dd_trace_sample_rate_one_ installs_catch_all). Rewrite the comment to match: append only when the rate is explicitly set; unset => no catch-all (libdatadog's 100% fallback). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bbecfd5
into
main
What does this PR do?
Fixes adaptive-sampling Remote Config in
datadog-opentelemetry, rebuilt on top of PR #154. Also addsDD_TRACE_SAMPLE_RATEsupport.Before this PR, four things were broken:
tracing_sampling_ratefrom RC was ignored. The handler only acted ontracing_sampling_rules; a rate-only payload installed nothing.tagswere rejected. RC encodestagsas[{"key": "env", "value_glob": "prod"}], butlibdd_sampling::SamplingRuleConfig::tagsonly accepted the map shape, so the parse errored and the whole update was dropped.DD_TRACE_SAMPLING_RULESwere wiped on every RC update.update_sampling_rules_from_remotedoes a full override, so any RC change replaced env rules, even when RC only sent a global rate.DD_TRACE_SAMPLE_RATEhad no effect. No binding existed.Motivation
End-to-end adaptive sampling didn't work.
What changed
Composition.
ApmTracingHandler::process_confignow follows the multi-source precedence model:DD_TRACE_SAMPLE_RATEtracing_sampling_rulestracing_sampling_rateenv_rulesenv_rules + catch_all(rc_rate)env_rules + catch_all(env_rate)env_rules + catch_all(rc_rate)rc_rules + catch_all(env_rate)if env_rate setrc_rules + catch_all(rc_rate)The synthetic catch-all uses libdatadog's default provenance, mapping to DM
-3(LOCAL_USER). See the "legacy behavior" comment in test_trace_sampling_rules_override_rate.DD_TRACE_SAMPLE_RATE. NewConfig::trace_sample_rate(): Option<f64>. When set, the sampler installs an implicit catch-all soDD_TRACE_RATE_LIMITapplies. Unset means no catch-all (libdatadog's no-rule path samples at 100%).Tag normalization. RC encodes
tagsas the list shape[{"key", "value_glob"}]. This is parsed natively bylibdd-sampling≥ 2.1.0 (DataDog/libdatadog#2033), so this PR bumpslibdd-sampling1.0.0 → 2.1.0 (pullinglibdd-common→ 4.2.0) and no in-tracer normalization is needed. An earlier revision carried anormalize_rc_tagsshim for this; it has been removed now that the upstream release is available. Regression coverage:test_handler_rc_rules_with_list_tags_applied(list-shape tags apply, tags preserved as a map) andtest_handler_malformed_tags_rejects_update(malformed list entries still rejected wholesale, not silently broadened).Fail-closed behavior. When libdatadog rejects an update (malformed tags, out-of-range rate),
process_configreturnsErrso the RC dispatcher reportsapply_state=3and the prior policy survives. Out-of-range RCtracing_sampling_rate(outside[0.0, 1.0]) and non-numeric values are rejected up-front.Env/code rate validation.
DD_TRACE_SAMPLE_RATE(and the programmaticset_trace_sample_rate) get the same range check: only finite values in[0.0, 1.0]are honored; out-of-range values are logged and treated as unset rather than installed as a catch-all rule that libdatadog would clamp (negative ⇒ drop all, > 1.0 ⇒ keep all).Target check. A config's
service_targetis honored before applying: a payload whose specific (non-*)service/envdoesn't match this tracer — primary service or an advertised extra service, compared case-insensitively — is ignored, so a mistargeted RC delivery can never change this service's sampling. Mirrors dd-trace-py/go.Additional Notes