Add corruption-detection test for probabilistic mitigations by mjp41 · Pull Request #848 · microsoft/snmalloc

mjp41 · 2026-05-10T08:18:47Z

Adds a new functional test, func/corruption_detection, that validates snmalloc's probabilistic memory-safety mitigations actually fire on the corruption patterns they are designed to catch. Without it, regressions that silently weakened a mitigation would be invisible to the existing suite, since every other test exercises only the non-failing arm of the integrity checks.

Each scenario runs in a forked child so the expected abort does not kill the harness. Detection is reported as the child being killed by SIGABRT/SIGSEGV/SIGBUS/SIGILL; a clean exit means the corruption went undetected and the test fails.

Six scenarios are covered, spanning the local-thread, remote-thread and large-allocation paths:

double_free - small alloc, two local frees of the same
slot. Detected by freelist_backward_edge
when the resulting cycle is later
traversed.
uaf_freelist - small alloc, free, then write garbage
into the freed slot's first two words
(the obfuscated next/prev). Detected by
check_prev on the next freelist
consumption.
oob_into_neighbor - tiny allocs, free even slots, overrun
from an odd live slot into freed
neighbours. Detected by check_prev when
the neighbour is later allocated.
remote_double_free - small alloc, free locally, then free
again from a different thread (the
second free travels via the remote
message queue). Detected as
!meta->is_unused() in the dealloc path.
remote_uaf - small alloc, free via a different
thread, then write garbage through the
dangling pointer while the slot sits on
the owning allocator's pending-remote
queue. Detected by check_prev during
handle_message_queue_slow's drain - a
code path no other test exercises.
large_double_free - allocation larger than any small
sizeclass (handled by the chunk
allocator and per-chunk metadata rather
than the slab freelist), freed twice.
Detected as !meta->is_unused() in the
large-dealloc path.

The test is Linux-only (uses fork()/waitpid()) and is a no-op when SNMALLOC_CHECK_CLIENT is not defined, since the mitigations it relies on are then compiled out.

The test is also instrumented to cooperate with clang source-based coverage: the forked child re-resolves LLVM_PROFILE_FILE with its own pid (the parent's %p expansion is otherwise inherited and all children would write to the same file) and a signal handler flushes .profraw before re-raising the fatal signal. The runtime entry points are declared as weak symbols so the test still links in non-coverage builds.

Picked up automatically by make_tests so it runs as both func-corruption_detection-fast and func-corruption_detection-check; the fast variant immediately exits with the "skip" message because the mitigations are off.

Adds a new functional test, func/corruption_detection, that validates snmalloc's probabilistic memory-safety mitigations actually fire on the corruption patterns they are designed to catch. Without it, regressions that silently weakened a mitigation would be invisible to the existing suite, since every other test exercises only the non-failing arm of the integrity checks. Each scenario runs in a forked child so the expected abort does not kill the harness. Detection is reported as the child being killed by SIGABRT/SIGSEGV/SIGBUS/SIGILL; a clean exit means the corruption went undetected and the test fails. Six scenarios are covered, spanning the local-thread, remote-thread and large-allocation paths: * double_free - small alloc, two local frees of the same slot. Detected by freelist_backward_edge when the resulting cycle is later traversed. * uaf_freelist - small alloc, free, then write garbage into the freed slot's first two words (the obfuscated next/prev). Detected by check_prev on the next freelist consumption. * oob_into_neighbor - tiny allocs, free even slots, overrun from an odd live slot into freed neighbours. Detected by check_prev when the neighbour is later allocated. * remote_double_free - small alloc, free locally, then free again from a different thread (the second free travels via the remote message queue). Detected as !meta->is_unused() in the dealloc path. * remote_uaf - small alloc, free via a different thread, then write garbage through the dangling pointer while the slot sits on the owning allocator's pending-remote queue. Detected by check_prev during handle_message_queue_slow's drain - a code path no other test exercises. * large_double_free - allocation larger than any small sizeclass (handled by the chunk allocator and per-chunk metadata rather than the slab freelist), freed twice. Detected as !meta->is_unused() in the large-dealloc path. The test is Linux-only (uses fork()/waitpid()) and is a no-op when SNMALLOC_CHECK_CLIENT is not defined, since the mitigations it relies on are then compiled out. The test is also instrumented to cooperate with clang source-based coverage: the forked child re-resolves LLVM_PROFILE_FILE with its own pid (the parent's %p expansion is otherwise inherited and all children would write to the same file) and a signal handler flushes .profraw before re-raising the fatal signal. The runtime entry points are declared as weak symbols so the test still links in non-coverage builds. Picked up automatically by make_tests so it runs as both func-corruption_detection-fast and func-corruption_detection-check; the fast variant immediately exits with the "skip" message because the mitigations are off.

Three issues surfaced in CI for the new corruption-detection test: 1. Linux: `large_double_free` did not detect any corruption. The subtest used `LARGE_SIZE = MIN_CHUNK_SIZE * 4 = 64 KiB`, which on the default Linux config is `MAX_SMALL_SIZECLASS_SIZE` — i.e. the largest *small* sizeclass — so the allocations went through the slab free-list path and never reached the chunk-allocator double-free check at all. Use `MAX_SMALL_SIZECLASS_SIZE * 2` so the size unambiguously falls into the large range. Once the test actually exercises the right path, the existing `is_backend_owned()` check in `dealloc_remote` (gated on the `sanity_checks` mitigation, which is part of `full_checks` in a default `SNMALLOC_CHECK_CLIENT` build) flags the double-free. 2. Mac: `-Wunused-function` errors for every `try_*` helper. The helpers are referenced only from `run_in_child`, which is already gated on `__linux__`. Move the helpers and the LLVM profile externs inside the same `#if defined(__linux__)` block so non-Linux builds compile cleanly. The non-Linux `main` already prints a "skipping" message and returns 0. 3. Windows: `__attribute__((weak))` is not portable to MSVC and there is no `SNMALLOC_WEAK` macro in `defines.h`. The weak symbols are only used by the Linux-only fork harness for coverage-flush, so gating them on `__linux__` is the natural fix. Also use `static_cast<uintptr_t>(0xDEADBEEFu)`-style literals for the UAF freelist-corruption writes so MSVC does not warn about narrowing on 32-bit Windows (C4305/C4309). The exact bit pattern does not matter: any non-zero garbage in the freelist node header will fail domestication or the doubly-linked invariant check. Verified locally: all 6 subtests now detect corruption (including large_double_free, which detects via signal 4 / SIGILL from the sanity_checks mitigation).

Two more failure modes from CI: 1. aarch64 (qemu cross-build, native arm64): every subtest reported "child died with unexpected signal 5". `__builtin_trap()` (which `SNMALLOC_FAST_FAIL` expands to on non-MSVC) emits `ud2` on x86 and is delivered as SIGILL (4), but on aarch64 it emits `brk #1000` and is delivered as SIGTRAP (5). The mitigation is firing correctly; the parent just wasn't recognising the signal. Add SIGTRAP to the accepted-signal set and to the child's coverage-flush handler list. 2. UBSan / TSan / GWP-ASan builds: corruption "not detected". Under those builds either the sanitizer intercepts allocation (replacing snmalloc's mitigated path entirely), or the fast-fail trap is intercepted by the sanitizer runtime before the parent sees a fatal signal, or both. Either way the test's premise ("snmalloc raises a fatal signal on corrupt frees") doesn't hold. Skip the test in those builds, detected via: * `__has_feature(address_sanitizer)` etc. (clang), * `__SANITIZE_ADDRESS__` / `__SANITIZE_THREAD__` (gcc), * `SNMALLOC_ENABLE_GWP_ASAN_INTEGRATION` (snmalloc-defined). The non-skipped behaviour is unchanged; verified locally that all 6 subtests still detect (mix of SIGILL and SIGABRT on x86).

When CORRUPTION_TEST_SKIP_SANITIZER was defined, `main()` returned early via the preprocessor, leaving every `try_*` helper and `run_in_child` lexically present but unreferenced — same `-Wunused-function` failure that originally hit the Mac build. Introduce CORRUPTION_TEST_ACTIVE = `__linux__ && !sanitizer` and gate the entire harness (LLVM-profile externs, namespace with all helpers, `main`'s test-driving block) on it. The two skip-paths (non-Linux, sanitizer/GWP-ASan) are now distinct top-level branches in `main` and neither references any helper, so nothing in the preprocessed translation unit is unused. Verified: with -DCORRUPTION_TEST_SKIP_SANITIZER the helpers are preprocessed out entirely (0 occurrences vs 8 without it). Local non-sanitizer run still detects all 6 subtests.

mjp41 added 4 commits May 9, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add corruption-detection test for probabilistic mitigations#848

Add corruption-detection test for probabilistic mitigations#848
mjp41 wants to merge 4 commits intomicrosoft:mainfrom
mjp41:coverage_failures

mjp41 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mjp41 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant