(2+3+4) concurrency / dpdk#1542
Conversation
d560e77 to
aa509ce
Compare
03fec49 to
8878b3c
Compare
aa509ce to
cdf3507
Compare
9632629 to
2be0637
Compare
1932e47 to
1ce8d26
Compare
c87ff7f to
e292f12
Compare
8d86891 to
0364eb1
Compare
mvachhar
left a comment
There was a problem hiding this comment.
You need to resolve merge conflicts here so while you are doing that, you should also clean up the commits a bit. There are some changes related to the move of QSBR into concurrency in the second commit when they should be in the first. I think AI should be able to fix that pretty easily so it is worth doing. If it takes too long, it isn't worht it and we can merge as is.
Err, sorry, #1539 (absorb) is already merged, the QSBR-move-related hunks that leaked into commit 2 are stuck here. There's no commit 1 left on this branch to fold them into |
Introduce a `parking_lot`-shaped facade over `Mutex` / `RwLock`, ship
the four backends together (std / parking_lot / shuttle / loom), and
rewire the data-path crates through it in a single atomic step.
The four pieces are bundled because a partial sweep leaves the
workspace in a state where the facade and its consumers disagree on
whether `Mutex::lock` returns `LockResult<Guard>` or a naked guard.
The shuttle/loom wrappers strip `LockResult` in exactly the same
poison-as-panic shape as the std wrapper, so splitting them re-
introduces that disagreement under the test/shuttle and test/loom CI
jobs.
## std::sync wrapper (poison-as-panic)
New `concurrency::sync` module exposing `Mutex` / `RwLock` with a
`parking_lot`-shaped naked-guard surface backed by a poison-as-panic
wrapper around `std::sync`. Workspace policy treats a crashed thread
as a crashed process: poison surfacing inside a lock acquire panics
through `concurrency::sync::poisoned()` rather than handing back a
possibly-torn `LockResult`.
* One cold `#[inline(never)] fn poisoned() -> !` keeps the hot path
branch-free; the wrapper is a single `match` per acquire.
* `RwLock::upgradable_read` is implemented as exclusive `write()`
for now -- sound but lossy. Documented at the backend module
level.
* `Arc`, `Weak`, atomics, `Condvar`, `mpsc`, `OnceLock`, etc. are
re-exported from `std::sync` unchanged.
## parking_lot as the production default
Adds a `parking_lot` Cargo feature (enabled by default) and
`concurrency/src/sync/parking_lot_backend.rs` -- a zero-cost
re-export of `parking_lot::{Mutex, RwLock}` plus the std re-exports
for everything `parking_lot` doesn't ship.
* Backend routing in `concurrency/src/sync/mod.rs` picks
`parking_lot_backend` when `feature = "parking_lot"` is on, falls
back to `std_backend` otherwise.
* `_strict_provenance` feature overrides `parking_lot` and routes
through `std_backend`: `parking_lot_core::word_lock` uses
integer-to-pointer casts that `-Zmiri-strict-provenance` rejects,
and the CI miri job (`--features=_strict_provenance`) exercises
the fallback slot, which needs `std::sync` underneath.
* Workspace `Cargo.toml` gets a `tokio` parking_lot override entry
so the `tokio` parking_lot feature is enabled workspace-wide.
## shuttle backend
`concurrency/src/sync/shuttle_backend.rs` mirrors `std_backend.rs`:
a thin poison-as-panic wrap around `shuttle::sync::{Mutex, RwLock}`
returning naked guards, plus re-exports for `Arc` / `Weak` / atomics
/ `Condvar` / `mpsc` that pass through unchanged. `OnceLock` falls
back to `std::sync` because shuttle does not ship one (it is
pure-std machinery and does not need model-checker integration).
Three feature flags share one `dep:shuttle`, arranged as a chain
(`shuttle_dfs` -> `shuttle_pct` -> `shuttle`):
* `shuttle` -- random scheduler (default first choice).
* `shuttle_pct` -- PCT, biases toward rare interleavings.
* `shuttle_dfs` -- DFS, exhaustive small-state exploration.
Same backend, scheduler chosen at runtime (by `concurrency::stress`,
added in the follow-on PR). The chain means a single `feature =
"shuttle"` cfg check is true under every variant, so the routing in
`sync/mod.rs` and the macro gates collapse to single-feature checks
(no `any(shuttle, shuttle_pct, shuttle_dfs)` chains).
## loom backend
`concurrency/src/sync/loom_backend.rs` mirrors the std and shuttle
wrappers: poison-as-panic around `loom::sync::{Mutex, RwLock}`
returning naked guards, with `LockResult` / `TryLockError` /
`OnceLock` etc. re-exported from `std::sync` because loom does not
ship them.
Loom 0.7 has no `Weak<T>` and no `Arc::downgrade`. The backend ships
a local `Arc<T>` wrapper that adds `downgrade` plus a `Weak<T>` shim
that holds a *strong* clone of the inner `loom::sync::Arc` until the
`Weak` itself drops. Documented quirks (`Weak::upgrade` always
succeeds after a `downgrade`, `Arc::weak_count` panics rather than
silently returning 0, last `Arc` drop does not free when a `Weak`
is live) are explicit at the module level -- they bound what loom
can model, and silent fallbacks would let assertions pass for the
wrong reason on every backend.
`sync/mod.rs` routes the loom feature ahead of every other backend.
## Workspace transition
Sweep the data-path crates (`flow-entry`, `flow-filter`, `nat`,
`net`, `pipeline`, `routing`, `stats`) to consume
`dataplane_concurrency::sync::{Arc, Mutex, RwLock, atomic, ...}`
instead of `std::sync::*` / `parking_lot::*` directly.
* Call sites drop the `.unwrap()` / `.expect("poisoned")` noise the
`LockResult`-shaped surface forced, since the facade returns
naked guards. Same shape as the existing `parking_lot::Mutex`
consumers.
* `concurrency_mode(std)` annotations at QSBR-touching test sites
in `nat::stateful::apalloc::test_alloc`,
`routing::fib::test`, and `flow-entry::flow_table::table::tests`
keep the existing std-only feature gating intact.
* `nat::portfw::flow_state::get_packet_port_fw_state`
simplification: the explicit `drop(guard)` dance before `debug!`
calls is removed (per the recorded author intent that it isn't
worth the code complexity); the guard now drops implicitly at
the end of scope.
Mechanical-by-design -- the sweep is mostly `use` rewrites and
`.unwrap()` removal. The semantic change is one step: every
consumer now gets the facade-routed primitives, so flipping the
loom/shuttle features at the top of the workspace is now
meaningful. The follow-on PR adds `#[concurrency::test]`, the
`stress` dispatcher, and the `thread::scope` shim that let a
single test source file run under any backend.
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
0364eb1 to
fb3e8a6
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a parking_lot-shaped synchronization facade in dataplane-concurrency (concurrency::sync) with selectable backends (std, parking_lot default, shuttle, loom), then performs a workspace-wide migration of datapath crates to consume the facade (and the new Slot/SlotOption publication types) so the workspace builds consistently across backends without LockResult/poison-handling noise.
Changes:
- Add
concurrency::syncwith backend routing and poison-as-panic wrappers (std/shuttle/loom) plus a zero-costparking_lotbackend. - Add/extend
concurrency::slot::{Slot, SlotOption}and migratearc-swapcall sites to the facade where needed. - Sweep datapath crates (nat/net/pipeline/routing/stats/flow-*) to use
concurrency::sync::{Arc, Mutex, RwLock, atomic, thread}and remove.unwrap()/.expect("poisoned")from lock acquisitions.
Reviewed changes
Copilot reviewed 52 out of 53 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| stats/src/vpc_stats.rs | Drop LockResult unwraps on locks |
| routing/src/fib/test.rs | Reduce miri workload; gate stdout |
| routing/src/fib/fibtable.rs | Switch Arc to facade type |
| routing/src/atable/resolver.rs | Switch atomics/thread/Arc to facade |
| pipeline/src/static_nf.rs | Switch Arc to facade type |
| pipeline/src/sample_nfs.rs | Replace ArcSwapOption with SlotOption |
| pipeline/src/pipeline.rs | Switch Arc/atomics to facade type |
| pipeline/Cargo.toml | Replace arc-swap dep with concurrency |
| net/src/packet/stats.rs | Switch atomics to facade re-export |
| net/src/flows/flow_info.rs | Switch atomics to facade re-export; drop unwraps |
| net/src/flows/display.rs | Assume naked-guard locks; use facade Weak |
| net/src/flows/atomic_instant.rs | Switch Ordering import to facade |
| nat/src/stateful/test.rs | Disable traced_test under miri; drop unwraps |
| nat/src/stateful/nf.rs | Drop lock unwraps; simplify write path |
| nat/src/stateful/icmp_handling.rs | Drop lock unwraps on reads |
| nat/src/stateful/flows.rs | Drop lock unwraps; simplify guards |
| nat/src/stateful/apalloc/test_alloc.rs | Route shuttle test via facade sync/thread |
| nat/src/stateful/apalloc/port_alloc.rs | Drop lock unwraps across bitmap/map ops |
| nat/src/stateful/apalloc/mod.rs | Remove unreachable unwrap on idle timeout |
| nat/src/stateful/apalloc/display.rs | Update display to new read() signature |
| nat/src/stateful/apalloc/alloc.rs | Change read() to return guard; idle_timeout non-Option |
| nat/src/stateful/allocator_writer.rs | Replace ArcSwapOption with SlotOption |
| nat/src/portfw/test.rs | Drop lock unwraps in tests |
| nat/src/portfw/portfwtable/portforwarder.rs | Switch Arc to facade type (incl tests) |
| nat/src/portfw/portfwtable/objects.rs | Switch Arc/atomics to facade types |
| nat/src/portfw/nf.rs | Switch Arc/Weak; drop lock unwraps |
| nat/src/portfw/icmp_handling.rs | Drop lock unwraps on reads |
| nat/src/portfw/flow_state.rs | Switch Arc/Weak; simplify write guards |
| nat/src/icmp_handler/nf.rs | Switch Arc to facade type; drop unwrap |
| nat/src/common/mod.rs | Switch Arc/atomics to facade types |
| nat/Cargo.toml | Add shuttle scheduler feature chain |
| flow-filter/src/tests.rs | Switch Arc to facade; drop unwraps |
| flow-filter/src/lib.rs | Switch Arc to facade; drop unwraps |
| flow-filter/Cargo.toml | Add concurrency dependency |
| flow-entry/src/flow_table/table.rs | Use facade OnceLock/locks; simplify try_read loop |
| flow-entry/src/flow_table/nf_lookup.rs | Disable traced_test under miri |
| flow-entry/src/flow_table/display.rs | Adjust to try_read -> Option facade |
| flow-entry/Cargo.toml | Add shuttle scheduler feature chain |
| concurrency/tests/quiescent_protocol.rs | Update docs/imports; use facade sync/thread |
| concurrency/tests/quiescent_properties.rs | Update docs/imports; use facade sync/atomics |
| concurrency/src/sync/std_backend.rs | Add std poison-as-panic wrapper backend |
| concurrency/src/sync/shuttle_backend.rs | Add shuttle poison-as-panic wrapper backend |
| concurrency/src/sync/parking_lot_backend.rs | Add zero-cost parking_lot backend re-export |
| concurrency/src/sync/mod.rs | Add backend routing logic/priorities |
| concurrency/src/sync/loom_backend.rs | Add loom wrapper + Arc/Weak shim |
| concurrency/src/slot.rs | Add SlotOption; unify slot APIs across backends |
| concurrency/src/quiescent.rs | Migrate QSBR internals to naked-guard locks |
| concurrency/src/lib.rs | Expose sync module; remove old pub use ...::sync |
| concurrency/Cargo.toml | Default to parking_lot; add shuttle feature chain |
| common/src/cliprovider.rs | Add Slot/SlotOption provider impls |
| common/Cargo.toml | Add concurrency dependency |
| Cargo.toml | Workspace deps + tokio parking_lot exception docs |
| Cargo.lock | Lockfile updates for new deps/features |
Ship the test-time infrastructure that lets a single source file run
under every backend that the facade (in the prior PR) routes to.
Three pieces:
* `concurrency::stress(body)` -- runtime dispatcher selecting
`loom::model` / `shuttle::check_random` / `_pct` / `_dfs` /
direct call based on the active feature.
* `#[concurrency::test]` -- attribute proc-macro expanding to
`#[test]` + `stress(|| body)`. One source file, five execution
strategies, no `loom::model(|| { ... })` boilerplate at call
sites.
* `concurrency::thread::scope` -- works on every backend. Std and
shuttle re-export theirs; loom 0.7 has no `scope`, so this PR
ships a local shim built on `loom::spawn` plus an
`Arc<Mutex<Option<T>>>` keepalive that preserves the same
drop-affinity guarantee `std::thread::scope` gives.
## stress + #[concurrency::test]
`concurrency::stress(body)`:
* default -- `body()` direct, no exploration
* `loom` -- `loom::model(body)`
* `shuttle` -- `shuttle::check_random(body, 16)`
* `shuttle_pct` -- `shuttle::check_pct(body, 16, 3)`
* `shuttle_dfs` -- `shuttle::check_dfs(body, Some(16))`
The dispatch carries a defensive `_ => compile_error!` arm: if a
new model-checker feature is added to the exclusion list but no
matching dispatch arm is wired up, the build fails loudly rather
than silently skipping exploration.
`#[concurrency::test]` is a new attribute proc-macro in
`concurrency-macros`. Under model-checker backends it emits a
nested `mod <fn_name> { mod concurrency_model { #[test] fn
<backend>() { ... } } }` so test reports / JUnit show the active
backend as the leaf name (e.g.
`snapshot_observes::concurrency_model::loom`). Default backend
stays flat. `#[should_panic]` / `#[ignore]` thread through;
`async fn` and `fn(args)` are rejected at parse time.
The crate path is resolved at expansion time via
`proc-macro-crate`, so consumers depending on
`dataplane-concurrency` under any alias work without setup;
integration tests inside this crate use `extern crate
dataplane_concurrency as concurrency;` since cargo rejects
self-deps.
`stress` is re-exported at the crate root for the macro expansion
but hidden from rustdoc (`#[doc(hidden)]`) -- users land on the
macro.
## thread::scope shim
`concurrency::thread` becomes a module (was a `pub use` alias) and
re-exports the active backend's `thread::*`. Under loom it also
includes a local `thread::scope` shim built on `loom::spawn` plus
an `Arc<Mutex<Option<T>>>` keepalive pattern. A narrow
`mem::transmute` lifts the spawned closure's `'scope` to
`'static` (loom 0.7 has no `spawn_unchecked`); the keepalive
trait object keeps its honest `'scope` and `ScopeInner<'scope>`
lives in `ManuallyDrop` so dropck does not constrain `'scope`
past `scope()`'s body. Per-site SAFETY comments at both the
transmute and the `ManuallyDrop::drop`.
Drop-affinity is enforced by an explicit `take_payload` on main
during teardown, not by an Arc-count assertion. Loom's
`JoinHandle::join` synchronises only on `notify`, which can fire
before the spawned thread's wrapper has finished dropping its
captures -- so the count-based check fires spuriously in some
schedules. Manual `take` on main gives the same guarantee
unconditionally.
## quiescent.rs / slot.rs cleanup
With every backend now returning naked guards from `.lock()` (the
prior PR), the remaining `LockResult` unwrap dispatch in
`quiescent.rs` disappears; `.lock()` calls reduce to plain
naked-guard form across all backends.
## Tests
* `tests/quiescent_model.rs` replaces `tests/quiescent_loom.rs`.
Each test is now a plain function annotated with
`#[concurrency::test]`; the body uses `thread::scope` instead
of the `Box::into_raw` + `&'static` lifetime workaround the
old loom-only file needed. Single source runs under any
backend, including the four-way schedule exploration shuttle
offers.
* `tests/thread_scope.rs` -- contract tests for `thread::scope`
(single/multi-spawn join, lifetime borrows, drop affinity,
nested-scope re-entry). Pins shim regressions at the right
layer.
* `tests/scope_property.rs` -- bolero x shuttle property test
generating random spawn-count / per-spawn-ops plans; pins the
"scope conservation" invariant (counter at scope return
equals the sum of generated increments) across many shapes
and interleavings.
* `tests/stress_dispatch.rs` -- coarse dispatch-routing check:
the default backend invokes the body exactly once; the
model-check backends invoke it more than once. Plus
`#[should_panic]` / `#[ignore]` attribute pass-through.
* `tests/arc_weak.rs` -- contract tests for `Arc` / `Weak` that
pass under every backend, plus loom-gated tests pinning the
documented shim quirks. File-level
`#![cfg(not(feature = "shuttle_pct"))]` -- PCT cannot run
single-threaded protocol checks.
## Documentation
`lib.rs` gains a "Compiles under loom != exhaustively checked
under loom" section listing each documented shim limitation (Weak
strong-clone semantics, `weak_count` panic, RwLock
upgradable_read, OnceLock ordering invisibility, Mutex::new const
divergence) with the workspace consequence -- most notably calling
out NAT's `Weak::upgrade().is_none()` patterns and why NAT is not
in the loom matrix today.
`sync/mod.rs` gains a "Portability footguns the facade does not
paper over" section.
`justfile` nextest filter: empty under loom (the archived
binaries are already feature-filtered), `shuttle` substring under
shuttle (matches both `quiescent_shuttle` and the new
`concurrency_model::shuttle` leaves).
Verified with `cargo nextest run -p dataplane-concurrency` under
each of: default features, `--features shuttle`, `--features
shuttle_pct`, `--features shuttle_dfs`, and `--features loom
--release`. Also verified clippy passes under each backend and
under `--all-features` (the silence_clippy escape hatch routes
through `std` and the new thread / stress modules play nice with
that).
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Replace the two-entry `features` matrix (loom + shuttle, single
schedule each) with a single `concurrency` job whose four steps each
exercise a different scheduler:
* shuttle -- random
* shuttle_pct -- PCT (rare interleavings)
* shuttle_dfs -- DFS (exhaustive small-state)
* loom -- the loom backend; scoped to `concurrency` only
because workspace feature unification under
`feature = "loom"` breaks crates that consume
`Weak<T>` / `Arc::downgrade` from
`concurrency::sync` (loom 0.7 doesn't ship
`Weak`).
Named steps mean GitHub's check list shows the failing scheduler
directly (`concurrency / shuttle_pct`) instead of one collapsed
`features/release/shuttle` line per backend. One runner does the
nix-setup once and reuses the cargo cache across all four steps;
the matrix shape spun up two runners and paid that cost twice.
`needs: check` is added so model-checker time is only burned after
clippy / build is green on the same SHA.
Summary job's stale `needs.features` reference is renamed to
`needs.concurrency` (without the rename it always emitted
`::error:: Some features job(s) failed` because `needs.features`
resolved to `null`).
The orphan `&release-build` YAML anchor (its only consumer was the
deleted matrix) is dropped; the matrix entry it labelled stays.
The new `concurrency` job needs its own `env:` rather than reusing
the `*check-env` anchor, because that anchor's `JUST_VARS`
references `matrix.build.*` and this job has no matrix. Each step
inlines the full `JUST_VARS` (with `docker_sock` / `debug_justfile`
/ `oci_repo`) -- the previous shape would have silently dropped
those settings via the step-level `JUST_VARS` override, falling
back to justfile defaults that point at local docker / a localhost
registry.
`justfile`'s nextest filter widens from exact-match
(`features == "shuttle"`) to a regex match (`features =~ "^shuttle"`)
so the `shuttle_pct` and `shuttle_dfs` steps also apply the
`shuttle` substring filter. Without it, every workspace test runs
under the shuttle feature; non-shuttle tests panic with
"ExecutionState NotSet" because `concurrency::sync` types ARE
shuttle primitives once the feature is on, and touching them
outside a `shuttle::check_*`-wrapped body trips the executor.
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
rte_eal_init treats argv[0] as the program name and skips it during option parsing. The wrapper was passing the caller's args verbatim, so the first flag was silently swallowed on every call -- including in production callers and in any test that hands eal::init a list of flags. Fix by prepending an owned `c"dataplane".to_owned()` CString as the first entry in the argv vector. Backing the placeholder with heap-allocated, writable storage (rather than pointing into .rodata) keeps the wrapper sound against any future DPDK that follows `rte_eal_init`'s public contract and modifies argv strings in place (e.g. for setproctitle-style program-name manipulation or getopt-style optarg rewrites). The `cast_mut` on `as_ptr` is sound because every CString in the vector owns its own writable allocation for the duration of the call. Also reserves one slot in `ValidatedEalArgs`'s `c_int::MAX` check so the argv array can hold the placeholder plus the validated user args without overflowing the `argc` cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Enable DPDK's ACL library in the nix build, add `rte_acl` to the dpdk-sys link list, and include `<rte_acl.h>` in the bindgen wrapper so that downstream Rust crates can call the rte_acl_* API. Prepares the ground for the safe rte_acl wrapper that follows; this commit on its own only widens what dpdk-sys exposes and does not yet add any Rust consumers. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Adds an empty acl module to the dpdk crate, the build-dependencies it will need (concurrency facade, bolero/nix/id for tests), and the field-layout primitives that the rest of the wrapper depends on: - `FieldDef`: a validated DPDK `rte_acl_field_def` (type / size / field_index / input_index / offset). - `FieldType`: Mask / Range / Bitmask, matching the C enum. - `FieldSize`: One / Two / Four-byte field widths. These types carry no DPDK runtime state and have no cross-module dependencies, so they can land independently. Subsequent commits will introduce error types, rules, build configuration, the context typestate, and the SIMD classify algorithm enum on top. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Each fallible ACL operation gets its own dedicated error type, with variants that match DPDK's documented error codes plus the Rust-side validation failures the wrapper catches up front: - `InvalidAclName`: not-ASCII / too-long / empty / interior NUL. - `AclCreateError`: invalid name, duplicate name (AlreadyExists), EINVAL, ENOMEM, Unknown. - `AclAddRulesError`: too many rules, per-rule `InvalidRule` (carries the rule index in the caller's slice), DPDK ENOMEM/EINVAL/Unknown. - `InvalidRule`: per-rule wrapper-side validation -- mask-prefix-length out of range (prevents UB in DPDK's `RTE_ACL_MASKLEN_TO_BITMASK`), reversed range bounds, category_mask bits beyond `num_categories`. - `AclBuildError`: ENOMEM, invalid config, exceeded max_size (ERANGE), Unknown. - `AclClassifyError`: invalid args, NotSupported (ENOTSUP from the per-algorithm path), Unknown. - `AclSetAlgorithmError`: invalid params, NotSupported, Unknown. No DPDK runtime is required to construct these; they are pure value types. Other ACL sub-modules will consume them in subsequent commits. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Wraps DPDK's `rte_acl_classify_alg` enum as a strongly-typed Rust enum [`ClassifyAlgorithm`], with `From<ClassifyAlgorithm> for u32` and a fallible `TryFrom<u32>`/`from_u32` round-trip for parsing values the caller did not produce. Variants mirror DPDK 1:1: Default, Scalar, NEON, SSE, AVX2, AVX512X16, AVX512X32, Altivec. Lives in its own module because the rest of the wrapper does not need to know about SIMD selection, and because the round-trip property is worth a dedicated bolero test. Subsequent commits wire the enum into `AclContext::classify_with_algorithm` and `AclContext::set_default_algorithm`. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
944c2fb to
15bd118
Compare
The heart of the safe rte_acl wrapper. These three modules land together because they have unavoidable mutual dependencies at the type level: - `config::AclCreateParams<N>::new` calls `Rule::<N>::RULE_SIZE` (compile-time stride for `rte_acl_add_rules`). - `Rule::validate` takes `&AclBuildConfig<N>` and uses it to enforce the per-rule invariants (prefix length, category mask, range bounds). - `context::AclContext<N, State>` owns both an `AclCreateParams<N>` and an `AclBuildConfig<N>` and drives them through DPDK's lifecycle. Highlights: - `Rule<N>` / `AclField` / `Priority` / `CategoryMask` / `RuleData` enforce non-zero invariants on userdata / rule size / max_rule_num and validate fields against the declared `FieldSize` so the validator is endian-safe. - `AclBuildConfig<N>` validates the input-index grouping rules DPDK imposes (first field 1 byte, others in 4-byte groups; contiguous; offset-ordered). `min_input_size` is computed from the group offsets so the `classify` safety contract is exact. - `AclContext<N, Configuring<N>>` accepts rule mutations through `&mut self`; `AclContext<N, Built<N>>` exposes `classify` through `&self`, matching DPDK's documented thread-safety guarantees. - Process-wide `ACL_CREATE_LOCK` (`OnceLock<Mutex<()>>` through the concurrency facade) serialises any operation that touches DPDK's global ACL registry: `new`, `Drop`, and `dump_all_contexts`. - `AclBuildFailure<N>` carries the original `Configuring<N>` context back to the caller after a failed `build`, preserving the rule list for retry. - bolero property tests for `AclBuildConfig::new` validators (num_categories, name, field_defs) and the `AclField` upper-byte invariants. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Finishes the safe rte_acl wrapper by promoting the most commonly used types out of their sub-modules into the `acl` namespace and adding the end-to-end lifecycle tests that exercise the full create / add_rules / build / classify / reset pipeline against a live EAL. The integration tests live in `acl::tests` (gated on `#[cfg(test)]`) and funnel every test through a shared `OnceLock<Eal>` so the mandatory once-per-process `rte_eal_init` is invoked at most once under any nextest configuration. Coverage includes: - `classify_smoke`: build a tiny ACL, classify two buffers, verify match / no-match. - `reset_round_trip`: build, classify, reset back to `Configuring`, add a new rule, rebuild, re-classify -- and assert the build config survives across the transition. - `add_rules_rejects_out_of_range_prefix_length`, `add_rules_rejects_category_mask_beyond_num_categories`: confirm wrapper-side `Rule::validate` rejects soundness-critical input before it reaches DPDK. - `classify_categories_validated_before_ffi`: classifier rejects out-of-range categories before forwarding to DPDK. - `set_default_algorithm_then_classify`, `classify_with_algorithm_scalar`: exercise the algorithm-selection paths. - `duplicate_name_rejected`: the process-wide create lock catches the duplicate-name TOCTOU. - `add_rules_after_overflow_failure`, `build_failure_returns_usable_context`: confirm the recoverable failure paths really do recover. - `classify_concurrent_arc_shared`: shows that the per-state Sync impl on `Built<N>` is not vacuous -- four threads classify concurrently through `Arc<AclContext>` and observe consistent results. Run with `cargo nextest run -p dataplane-dpdk acl::tests` after `just setup-roots` and re-entering nix-shell so that `DATAPLANE_SYSROOT` picks up the rte_acl-enabled sysroot. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
15bd118 to
f527d81
Compare
| //! | ||
| //! In production this is `arc_swap::ArcSwap` -- lock-free read fast path, | ||
| //! which is what makes [`Subscriber::snapshot`] cheap on the data-plane. | ||
| //! In production these are `arc_swap::ArcSwap` / `ArcSwapOption` -- |
There was a problem hiding this comment.
What is the reason to call it slot / slotoption instead of the original ?
Summary
Introduce a
parking_lot-shaped facade overMutex/RwLock, ship the four backends together (std / parking_lot / shuttle / loom), and rewire the data-path crates through it in a single atomic step.The four pieces are bundled because a partial sweep leaves the workspace in a state where the facade and its consumers disagree on whether
Mutex::lockreturnsLockResult<Guard>or a naked guard. The shuttle/loom wrappers stripLockResultin exactly the same poison-as-panic shape as the std wrapper, so splitting them re-introduces that disagreement under thetest/shuttleandtest/loomCI jobs.std::sync wrapper (poison-as-panic)
concurrency::syncmodule exposingMutex/RwLockwith aparking_lot-shaped naked-guard surface backed by a poison-as-panic wrapper aroundstd::sync. Workspace policy treats a crashed thread as a crashed process: poison surfacing inside a lock acquire panics throughconcurrency::sync::poisoned()rather than handing back a possibly-tornLockResult.#[inline(never)] fn poisoned() -> !keeps the hot path branch-free; the wrapper is a singlematchper acquire.RwLock::upgradable_readis implemented as exclusivewrite()for now -- sound but lossy. Documented at the backend module level.Arc,Weak, atomics,Condvar,mpsc,OnceLock, etc. are re-exported fromstd::syncunchanged.parking_lot as the production default
parking_lotCargo feature (enabled by default) andconcurrency/src/sync/parking_lot_backend.rs-- a zero-cost re-export ofparking_lot::{Mutex, RwLock}plus the std re-exports for everythingparking_lotdoesn't ship.concurrency/src/sync/mod.rspicksparking_lot_backendwhenfeature = "parking_lot"is on, falls back tostd_backendotherwise._strict_provenancefeature overridesparking_lotand routes throughstd_backend:parking_lot_core::word_lockuses integer-to-pointer casts that-Zmiri-strict-provenancerejects, and the CI miri job (--features=_strict_provenance) exercises the fallback slot, which needsstd::syncunderneath.Cargo.tomlgets atokioparking_lot override entry so thetokioparking_lot feature is enabled workspace-wide.shuttle backend
concurrency/src/sync/shuttle_backend.rsmirrorsstd_backend.rs: a thin poison-as-panic wrap aroundshuttle::sync::{Mutex, RwLock}returning naked guards, plus re-exports forArc/Weak/ atomics /Condvar/mpscthat pass through unchanged.OnceLockfalls back tostd::syncbecause shuttle doesn't ship one.dep:shuttle, arranged as a chain (shuttle_dfs->shuttle_pct->shuttle):shuttle-- random scheduler (default first choice).shuttle_pct-- PCT, biases toward rare interleavings.shuttle_dfs-- DFS, exhaustive small-state exploration.Same backend, scheduler chosen at runtime (by
concurrency::stress, added in the follow-on PR). The chain means a singlefeature = "shuttle"cfg check is true under every variant, so the routing insync/mod.rscollapses to single-feature checks (noany(shuttle, shuttle_pct, shuttle_dfs)chains).loom backend
concurrency/src/sync/loom_backend.rsmirrors the std and shuttle wrappers: poison-as-panic aroundloom::sync::{Mutex, RwLock}returning naked guards, withLockResult/TryLockError/OnceLocketc. re-exported fromstd::syncbecause loom doesn't ship them.Weak<T>and noArc::downgrade. The backend adds a localArc<T>wrapper that addsdowngradeplus aWeak<T>shim that holds a strong clone of the innerloom::sync::Arcuntil theWeakitself drops. Documented quirks (Weak::upgradealways succeeds after adowngrade,Arc::weak_countpanics rather than silently returning 0, lastArcdrop does not free when aWeakis live) are explicit at the module level -- silent fallbacks would let assertions pass for the wrong reason on every backend.sync/mod.rsroutes the loom feature ahead of every other backend.Workspace transition
flow-entry,flow-filter,nat,net,pipeline,routing,stats) to consumedataplane_concurrency::sync::{Arc, Mutex, RwLock, atomic, ...}instead ofstd::sync::*/parking_lot::*directly..unwrap()/.expect("poisoned")noise theLockResult-shaped surface forced, since the facade returns naked guards. Same shape as the existingparking_lot::Mutexconsumers.concurrency_mode(std)annotations at QSBR-touching test sites innat::stateful::apalloc::test_alloc,routing::fib::test, andflow-entry::flow_table::table::testskeep the existing std-only feature gating intact.nat::portfw::flow_state::get_packet_port_fw_statesimplification: the explicitdrop(guard)dance beforedebug!calls is removed (per the recorded author intent that it isn't worth the code complexity); the guard now drops implicitly at the end of scope.The sweep is mostly mechanical (
userewrites and.unwrap()removal). The semantic change is one step: every consumer now gets the facade-routed primitives, so flipping the loom/shuttle features at the top of the workspace is now meaningful.PR 2 of 3 in the concurrency facade stack (the original 4-PR plan collapsed the shuttle-only and loom-only PRs into this one because the wrappers, the workspace sweep, and the consumer changes can't be independently merged without leaving CI red). Depends on #1539 -- absorb-quiescent. The follow-on PR (#1544) adds
#[concurrency::test], thestressdispatcher, and thethread::scopeshim.