(3) #[concurrency::test] macro and thread::scope shim#1544
Conversation
1ce8d26 to
8d86891
Compare
d89cf27 to
6939124
Compare
8d86891 to
0364eb1
Compare
6939124 to
732e640
Compare
5e2b328 to
08e23f9
Compare
mvachhar
left a comment
There was a problem hiding this comment.
One question here, if ::stress enables loom everywhere (it currently doesn't because of Weak), aren't there many cases where the model checker will fail to terminate or consume too much time/memory? Again, no change needed if this is not currently and issue, but it is a concern.
Ship the test-time infrastructure that lets a single source file run
under every backend that the facade (in the prior PR) routes to.
Three pieces:
* `concurrency::stress(body)` -- runtime dispatcher selecting
`loom::model` / `shuttle::check_random` / `_pct` / `_dfs` /
direct call based on the active feature.
* `#[concurrency::test]` -- attribute proc-macro expanding to
`#[test]` + `stress(|| body)`. One source file, five execution
strategies, no `loom::model(|| { ... })` boilerplate at call
sites.
* `concurrency::thread::scope` -- works on every backend. Std and
shuttle re-export theirs; loom 0.7 has no `scope`, so this PR
ships a local shim built on `loom::spawn` plus an
`Arc<Mutex<Option<T>>>` keepalive that preserves the same
drop-affinity guarantee `std::thread::scope` gives.
## stress + #[concurrency::test]
`concurrency::stress(body)`:
* default -- `body()` direct, no exploration
* `loom` -- `loom::model(body)`
* `shuttle` -- `shuttle::check_random(body, 16)`
* `shuttle_pct` -- `shuttle::check_pct(body, 16, 3)`
* `shuttle_dfs` -- `shuttle::check_dfs(body, Some(16))`
The dispatch carries a defensive `_ => compile_error!` arm: if a
new model-checker feature is added to the exclusion list but no
matching dispatch arm is wired up, the build fails loudly rather
than silently skipping exploration.
`#[concurrency::test]` is a new attribute proc-macro in
`concurrency-macros`. Under model-checker backends it emits a
nested `mod <fn_name> { mod concurrency_model { #[test] fn
<backend>() { ... } } }` so test reports / JUnit show the active
backend as the leaf name (e.g.
`snapshot_observes::concurrency_model::loom`). Default backend
stays flat. `#[should_panic]` / `#[ignore]` thread through;
`async fn` and `fn(args)` are rejected at parse time.
The crate path is resolved at expansion time via
`proc-macro-crate`, so consumers depending on
`dataplane-concurrency` under any alias work without setup;
integration tests inside this crate use `extern crate
dataplane_concurrency as concurrency;` since cargo rejects
self-deps.
`stress` is re-exported at the crate root for the macro expansion
but hidden from rustdoc (`#[doc(hidden)]`) -- users land on the
macro.
## thread::scope shim
`concurrency::thread` becomes a module (was a `pub use` alias) and
re-exports the active backend's `thread::*`. Under loom it also
includes a local `thread::scope` shim built on `loom::spawn` plus
an `Arc<Mutex<Option<T>>>` keepalive pattern. A narrow
`mem::transmute` lifts the spawned closure's `'scope` to
`'static` (loom 0.7 has no `spawn_unchecked`); the keepalive
trait object keeps its honest `'scope` and `ScopeInner<'scope>`
lives in `ManuallyDrop` so dropck does not constrain `'scope`
past `scope()`'s body. Per-site SAFETY comments at both the
transmute and the `ManuallyDrop::drop`.
Drop-affinity is enforced by an explicit `take_payload` on main
during teardown, not by an Arc-count assertion. Loom's
`JoinHandle::join` synchronises only on `notify`, which can fire
before the spawned thread's wrapper has finished dropping its
captures -- so the count-based check fires spuriously in some
schedules. Manual `take` on main gives the same guarantee
unconditionally.
## quiescent.rs / slot.rs cleanup
With every backend now returning naked guards from `.lock()` (the
prior PR), the remaining `LockResult` unwrap dispatch in
`quiescent.rs` disappears; `.lock()` calls reduce to plain
naked-guard form across all backends.
## Tests
* `tests/quiescent_model.rs` replaces `tests/quiescent_loom.rs`.
Each test is now a plain function annotated with
`#[concurrency::test]`; the body uses `thread::scope` instead
of the `Box::into_raw` + `&'static` lifetime workaround the
old loom-only file needed. Single source runs under any
backend, including the four-way schedule exploration shuttle
offers.
* `tests/thread_scope.rs` -- contract tests for `thread::scope`
(single/multi-spawn join, lifetime borrows, drop affinity,
nested-scope re-entry). Pins shim regressions at the right
layer.
* `tests/scope_property.rs` -- bolero x shuttle property test
generating random spawn-count / per-spawn-ops plans; pins the
"scope conservation" invariant (counter at scope return
equals the sum of generated increments) across many shapes
and interleavings.
* `tests/stress_dispatch.rs` -- coarse dispatch-routing check:
the default backend invokes the body exactly once; the
model-check backends invoke it more than once. Plus
`#[should_panic]` / `#[ignore]` attribute pass-through.
* `tests/arc_weak.rs` -- contract tests for `Arc` / `Weak` that
pass under every backend, plus loom-gated tests pinning the
documented shim quirks. File-level
`#![cfg(not(feature = "shuttle_pct"))]` -- PCT cannot run
single-threaded protocol checks.
## Documentation
`lib.rs` gains a "Compiles under loom != exhaustively checked
under loom" section listing each documented shim limitation (Weak
strong-clone semantics, `weak_count` panic, RwLock
upgradable_read, OnceLock ordering invisibility, Mutex::new const
divergence) with the workspace consequence -- most notably calling
out NAT's `Weak::upgrade().is_none()` patterns and why NAT is not
in the loom matrix today.
`sync/mod.rs` gains a "Portability footguns the facade does not
paper over" section.
`justfile` nextest filter: empty under loom (the archived
binaries are already feature-filtered), `shuttle` substring under
shuttle (matches both `quiescent_shuttle` and the new
`concurrency_model::shuttle` leaves).
Verified with `cargo nextest run -p dataplane-concurrency` under
each of: default features, `--features shuttle`, `--features
shuttle_pct`, `--features shuttle_dfs`, and `--features loom
--release`. Also verified clippy passes under each backend and
under `--all-features` (the silence_clippy escape hatch routes
through `std` and the new thread / stress modules play nice with
that).
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Replace the two-entry `features` matrix (loom + shuttle, single
schedule each) with a single `concurrency` job whose four steps each
exercise a different scheduler:
* shuttle -- random
* shuttle_pct -- PCT (rare interleavings)
* shuttle_dfs -- DFS (exhaustive small-state)
* loom -- the loom backend; scoped to `concurrency` only
because workspace feature unification under
`feature = "loom"` breaks crates that consume
`Weak<T>` / `Arc::downgrade` from
`concurrency::sync` (loom 0.7 doesn't ship
`Weak`).
Named steps mean GitHub's check list shows the failing scheduler
directly (`concurrency / shuttle_pct`) instead of one collapsed
`features/release/shuttle` line per backend. One runner does the
nix-setup once and reuses the cargo cache across all four steps;
the matrix shape spun up two runners and paid that cost twice.
`needs: check` is added so model-checker time is only burned after
clippy / build is green on the same SHA.
Summary job's stale `needs.features` reference is renamed to
`needs.concurrency` (without the rename it always emitted
`::error:: Some features job(s) failed` because `needs.features`
resolved to `null`).
The orphan `&release-build` YAML anchor (its only consumer was the
deleted matrix) is dropped; the matrix entry it labelled stays.
The new `concurrency` job needs its own `env:` rather than reusing
the `*check-env` anchor, because that anchor's `JUST_VARS`
references `matrix.build.*` and this job has no matrix. Each step
inlines the full `JUST_VARS` (with `docker_sock` / `debug_justfile`
/ `oci_repo`) -- the previous shape would have silently dropped
those settings via the step-level `JUST_VARS` override, falling
back to justfile defaults that point at local docker / a localhost
registry.
`justfile`'s nextest filter widens from exact-match
(`features == "shuttle"`) to a regex match (`features =~ "^shuttle"`)
so the `shuttle_pct` and `shuttle_dfs` steps also apply the
`shuttle` substring filter. Without it, every workspace test runs
under the shuttle feature; non-shuttle tests panic with
"ExecutionState NotSet" because `concurrency::sync` types ARE
shuttle primitives once the feature is on, and touching them
outside a `shuttle::check_*`-wrapped body trips the executor.
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
I really have not seen that so far. The more common issue is just failing to compile under loom. If you give loom an appropriate domain then it finishes very quickly |
0364eb1 to
fb3e8a6
Compare
08e23f9 to
09fab17
Compare
rte_eal_init treats argv[0] as the program name and skips it during option parsing. The wrapper was passing the caller's args verbatim, so the first flag was silently swallowed on every call -- including in production callers and in any test that hands eal::init a list of flags. Fix by prepending an owned `c"dataplane".to_owned()` CString as the first entry in the argv vector. Backing the placeholder with heap-allocated, writable storage (rather than pointing into .rodata) keeps the wrapper sound against any future DPDK that follows `rte_eal_init`'s public contract and modifies argv strings in place (e.g. for setproctitle-style program-name manipulation or getopt-style optarg rewrites). The `cast_mut` on `as_ptr` is sound because every CString in the vector owns its own writable allocation for the duration of the call. Also reserves one slot in `ValidatedEalArgs`'s `c_int::MAX` check so the argv array can hold the placeholder plus the validated user args without overflowing the `argc` cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Enable DPDK's ACL library in the nix build, add `rte_acl` to the dpdk-sys link list, and include `<rte_acl.h>` in the bindgen wrapper so that downstream Rust crates can call the rte_acl_* API. Prepares the ground for the safe rte_acl wrapper that follows; this commit on its own only widens what dpdk-sys exposes and does not yet add any Rust consumers. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Adds an empty acl module to the dpdk crate, the build-dependencies it will need (concurrency facade, bolero/nix/id for tests), and the field-layout primitives that the rest of the wrapper depends on: - `FieldDef`: a validated DPDK `rte_acl_field_def` (type / size / field_index / input_index / offset). - `FieldType`: Mask / Range / Bitmask, matching the C enum. - `FieldSize`: One / Two / Four-byte field widths. These types carry no DPDK runtime state and have no cross-module dependencies, so they can land independently. Subsequent commits will introduce error types, rules, build configuration, the context typestate, and the SIMD classify algorithm enum on top. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Each fallible ACL operation gets its own dedicated error type, with variants that match DPDK's documented error codes plus the Rust-side validation failures the wrapper catches up front: - `InvalidAclName`: not-ASCII / too-long / empty / interior NUL. - `AclCreateError`: invalid name, duplicate name (AlreadyExists), EINVAL, ENOMEM, Unknown. - `AclAddRulesError`: too many rules, per-rule `InvalidRule` (carries the rule index in the caller's slice), DPDK ENOMEM/EINVAL/Unknown. - `InvalidRule`: per-rule wrapper-side validation -- mask-prefix-length out of range (prevents UB in DPDK's `RTE_ACL_MASKLEN_TO_BITMASK`), reversed range bounds, category_mask bits beyond `num_categories`. - `AclBuildError`: ENOMEM, invalid config, exceeded max_size (ERANGE), Unknown. - `AclClassifyError`: invalid args, NotSupported (ENOTSUP from the per-algorithm path), Unknown. - `AclSetAlgorithmError`: invalid params, NotSupported, Unknown. No DPDK runtime is required to construct these; they are pure value types. Other ACL sub-modules will consume them in subsequent commits. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Wraps DPDK's `rte_acl_classify_alg` enum as a strongly-typed Rust enum [`ClassifyAlgorithm`], with `From<ClassifyAlgorithm> for u32` and a fallible `TryFrom<u32>`/`from_u32` round-trip for parsing values the caller did not produce. Variants mirror DPDK 1:1: Default, Scalar, NEON, SSE, AVX2, AVX512X16, AVX512X32, Altivec. Lives in its own module because the rest of the wrapper does not need to know about SIMD selection, and because the round-trip property is worth a dedicated bolero test. Subsequent commits wire the enum into `AclContext::classify_with_algorithm` and `AclContext::set_default_algorithm`. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The heart of the safe rte_acl wrapper. These three modules land together because they have unavoidable mutual dependencies at the type level: - `config::AclCreateParams<N>::new` calls `Rule::<N>::RULE_SIZE` (compile-time stride for `rte_acl_add_rules`). - `Rule::validate` takes `&AclBuildConfig<N>` and uses it to enforce the per-rule invariants (prefix length, category mask, range bounds). - `context::AclContext<N, State>` owns both an `AclCreateParams<N>` and an `AclBuildConfig<N>` and drives them through DPDK's lifecycle. Highlights: - `Rule<N>` / `AclField` / `Priority` / `CategoryMask` / `RuleData` enforce non-zero invariants on userdata / rule size / max_rule_num and validate fields against the declared `FieldSize` so the validator is endian-safe. - `AclBuildConfig<N>` validates the input-index grouping rules DPDK imposes (first field 1 byte, others in 4-byte groups; contiguous; offset-ordered). `min_input_size` is computed from the group offsets so the `classify` safety contract is exact. - `AclContext<N, Configuring<N>>` accepts rule mutations through `&mut self`; `AclContext<N, Built<N>>` exposes `classify` through `&self`, matching DPDK's documented thread-safety guarantees. - Process-wide `ACL_CREATE_LOCK` (`OnceLock<Mutex<()>>` through the concurrency facade) serialises any operation that touches DPDK's global ACL registry: `new`, `Drop`, and `dump_all_contexts`. - `AclBuildFailure<N>` carries the original `Configuring<N>` context back to the caller after a failed `build`, preserving the rule list for retry. - bolero property tests for `AclBuildConfig::new` validators (num_categories, name, field_defs) and the `AclField` upper-byte invariants. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Finishes the safe rte_acl wrapper by promoting the most commonly used types out of their sub-modules into the `acl` namespace and adding the end-to-end lifecycle tests that exercise the full create / add_rules / build / classify / reset pipeline against a live EAL. The integration tests live in `acl::tests` (gated on `#[cfg(test)]`) and funnel every test through a shared `OnceLock<Eal>` so the mandatory once-per-process `rte_eal_init` is invoked at most once under any nextest configuration. Coverage includes: - `classify_smoke`: build a tiny ACL, classify two buffers, verify match / no-match. - `reset_round_trip`: build, classify, reset back to `Configuring`, add a new rule, rebuild, re-classify -- and assert the build config survives across the transition. - `add_rules_rejects_out_of_range_prefix_length`, `add_rules_rejects_category_mask_beyond_num_categories`: confirm wrapper-side `Rule::validate` rejects soundness-critical input before it reaches DPDK. - `classify_categories_validated_before_ffi`: classifier rejects out-of-range categories before forwarding to DPDK. - `set_default_algorithm_then_classify`, `classify_with_algorithm_scalar`: exercise the algorithm-selection paths. - `duplicate_name_rejected`: the process-wide create lock catches the duplicate-name TOCTOU. - `add_rules_after_overflow_failure`, `build_failure_returns_usable_context`: confirm the recoverable failure paths really do recover. - `classify_concurrent_arc_shared`: shows that the per-state Sync impl on `Built<N>` is not vacuous -- four threads classify concurrently through `Arc<AclContext>` and observe consistent results. Run with `cargo nextest run -p dataplane-dpdk acl::tests` after `just setup-roots` and re-entering nix-shell so that `DATAPLANE_SYSROOT` picks up the rte_acl-enabled sysroot. Signed-off-by: Daniel Noland <daniel@githedgehog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
944c2fb
into
pr/daniel-noland/concurrency-stack/4-workspace-transition
Summary
The capstone PR. Ship the test-time infrastructure that lets a single source file run under every backend the facade (in #1542) routes to.
Three pieces, plus the test suite that exercises them.
concurrency::stress+#[concurrency::test]concurrency::stress(body)dispatches toloom::model/shuttle::check_random/_pct/_dfs/ direct call based on the active feature. The dispatch has a defensive_ => compile_error!arm: if a new model-checker feature is added to the exclusion list but no matching dispatch arm is wired up, the build fails loudly rather than silently skipping exploration.#[concurrency::test]proc-macro expands to#[test]plusstress(|| body). Under model-checker backends it emits a nestedmod <fn_name> { mod concurrency_model { #[test] fn <backend>() { ... } } }so test reports / JUnit show the active backend as the leaf name (e.g.snapshot_observes::concurrency_model::loom). Default backend stays flat.#[should_panic]/#[ignore]thread through;async fnandfn(args)are rejected at parse time.proc-macro-crate, so consumers under any alias work without setup; integration tests inside this crate useextern crate dataplane_concurrency as concurrency;since cargo rejects self-deps.stressis re-exported at the crate root for the macro expansion but hidden from rustdoc (#[doc(hidden)]) -- users land on the macro.thread::scopeshimconcurrency::thread::scopeworks on every backend. Std and shuttle re-export theirs; loom 0.7 doesn't havescope, so we shipconcurrency/src/thread/loom_scope.rs: built onloom::spawnplus anArc<Mutex<Option<T>>>keepalive that preserves the same drop-affinity guaranteestd::thread::scopegives.take_payloadon main during teardown, not by an Arc-count assertion. Loom'sJoinHandle::joinsynchronises only onnotify, which can fire before the spawned thread's wrapper has finished dropping its captures -- so the count-based check fires spuriously in some schedules. Manualtakeon main gives the same guarantee unconditionally.mem::transmuteto lift the spawned closure's'scopeto'static(loom 0.7 has nospawn_unchecked); the keepalive trait object keeps its honest'scopeandScopeInner<'scope>lives inManuallyDropso dropck doesn't constrain'scopepastscope()'s body. Per-site SAFETY comments at both the transmute and theManuallyDrop::drop.Tests
tests/quiescent_model.rsreplaces the loom-onlyquiescent_loom.rs. One source, every backend, via#[concurrency::test]+thread::scope.tests/thread_scope.rs-- contract tests forthread::scope(single/multi-spawn, drop affinity, nested-scope re-entry).tests/scope_property.rs-- bolero x shuttle conservation property over generated spawn/op plans.tests/stress_dispatch.rs--stressinvokes body once on default, more than once on model-check backends; plus#[should_panic]/#[ignore]attribute pass-through.tests/arc_weak.rs--Arc/Weakcontract (passes under every backend, plus loom-only quirk tests). File-level#![cfg(not(feature = "shuttle_pct"))]-- PCT can't run single-threaded protocol checks.justfilenextest filter: empty under loom (the archived binaries are already feature-filtered),shuttlesubstring under shuttle (matches bothquiescent_shuttleand the newconcurrency_model::shuttleleaves).Documentation
lib.rsgains a "Compiles under loom != exhaustively checked under loom" section listing each documented shim limitation (Weak strong-clone semantics,weak_countpanic, RwLock upgradable_read, OnceLock ordering invisibility, Mutex::new const divergence) with the workspace consequence -- most notably calling out NAT'sWeak::upgrade().is_none()patterns and why NAT isn't in the loom matrix today.sync/mod.rsgains a "Portability footguns the facade does not paper over" section.PR 3 of 3 in the concurrency facade stack. Depends on #1542 -- sync facade + backends + workspace sweep. After this lands, the stack reaches feature parity with the original PR.