Skip to content

fix(routes_mgr): log WARN on DDS QoS mismatch for existing routes#689

Open
aki1770-del wants to merge 1 commit intoeclipse-zenoh:mainfrom
aki1770-del:fix/qos-mismatch-warn-435
Open

fix(routes_mgr): log WARN on DDS QoS mismatch for existing routes#689
aki1770-del wants to merge 1 commit intoeclipse-zenoh:mainfrom
aki1770-del:fix/qos-mismatch-warn-435

Conversation

@aki1770-del
Copy link
Copy Markdown

Summary

When a ROS2 node is (re-)discovered for a topic that already has an active bridge route, get_or_create_route_{publisher,subscriber} returned the existing route without checking whether the new entity's QoS is compatible with the QoS the route's DDS Reader/Writer was originally created with.

An incompatible Reliability or Durability QoS causes DDS to silently refuse endpoint matching. The bridge holds stale route handles with no error propagated to the Zenoh layer. The only visible symptom is a Timeout(5s) warning on the querying side — insufficient to diagnose the root cause.

Closes #435

Fix

  • qos_helpers: add qos_is_compatible(existing, incoming) checking Reliability × Durability per DDS spec §7.1.3
  • qos_helpers: add qos_summary() for human-readable WARN messages
  • route_publisher: expose reader_qos() (the existing _reader_qos field was already stored but unused)
  • route_subscriber: store _writer_qos at route creation; expose writer_qos()
  • routes_mgr: in get_or_create_route_{publisher,subscriber} Occupied branch, call qos_is_compatible() and emit tracing::warn! on mismatch with both QoS summaries

This converts a silent DDS endpoint-matching failure into an observable log entry. Route teardown and recreation on QoS change are left as follow-up work once the detection pattern is confirmed by field testing.

Existing behavior audit (OPS-RULE-016)

  • routes_mgr.rs:591 — Occupied branch returns entry.into_mut() with no QoS comparison
  • route_publisher.rs:100_reader_qos: Qos field already stored but never exposed or compared
  • route_subscriber.rs — no QoS field stored at all; added _writer_qos in this PR
  • qos_helpers.rsis_reliable() and is_transient_local() existed; qos_is_compatible() is new

Scope classification (OPS-RULE-017)

(a) Bug fix — silent failure made observable; no change to route lifecycle behavior.

Prior art consulted (OPS-RULE-018)

  1. Issue #435 — root cause description with multi-host reproduce steps (~90 topics, node cycling)
  2. RouteStatus::_QoSConflict variant (routes_mgr.rs:62) — already defined but unused; confirms QoS conflict detection was an intended future feature
  3. adapt_writer_qos_for_reader / adapt_reader_qos_for_writer in qos_helpers.rs — existing QoS adaptation patterns that this PR extends with a compatibility check

Test evidence

This plugin requires a live DDS + Zenoh environment to test — unit test coverage for the new qos_is_compatible function can be added in a follow-up if the project has a mock DDS harness. The change has been verified to compile cleanly (cargo check).

AI-assisted — authored with Claude, reviewed by Komada.

…lipse-zenoh#435)

When a ROS2 node is (re-)discovered for a topic route that already
exists, the bridge reused the existing route without checking whether
the new entity's QoS is compatible with the QoS the route's DDS
Reader/Writer was originally created with. An incompatible Reliability
or Durability QoS causes DDS to silently refuse endpoint matching —
the bridge holds stale route handles with no error propagated to the
Zenoh layer. The only visible symptom was a Timeout(5s) warning on
the querying side, which was insufficient to diagnose the root cause.

Changes:
- `qos_helpers`: add `qos_is_compatible(existing, incoming)` checking
  Reliability × Durability compatibility per DDS spec §7.1.3
- `qos_helpers`: add `qos_summary()` for human-readable WARN messages
- `route_publisher`: expose `reader_qos()` (existing `_reader_qos` field)
- `route_subscriber`: store `_writer_qos` at creation; expose `writer_qos()`
- `routes_mgr`: in `get_or_create_route_{publisher,subscriber}` Occupied
  branch, call `qos_is_compatible()` and emit tracing::warn! on mismatch,
  including both QoS summaries for easy diagnosis

This converts a silent DDS failure into an observable log entry. Route
teardown and recreation on QoS change are left as follow-up work.

Fixes eclipse-zenoh#435

Co-Authored-By: Claude and aki1770-del <aki1770@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Freezes and does not recover - no error

1 participant