rtld: share exactly the initial thread's static TLS pages for IA2 by oinoom · Pull Request #5 · immunant/IA2-glibc

oinoom · 2026-04-20T17:25:11Z

IA2's current dav1d single-thread branch no longer wants to retag the initial thread's TLS neighborhood from ia2_start() after startup. That runtime-side retag solved the decode path, but it also caused immediate tracer-mode failures in standard builds because the tracer saw a second pkey_mprotect of loader-owned TLS state during process init.

The previous loader-side fix moved that policy into init_tls(), where rtld allocates the initial thread's static TLS block and DTV in the first place. That change kept dav1d working and restored the tracer sweep, but it still used a conservative fixed window: retag the TCB page plus up to eight pages below it, and retag the DTV page separately if it fell outside that range.

That fixed-size window is broader than necessary. A narrower 1-page probe regressed strict single-thread decode, and GDB showed exactly why: during ivf_read() -> dav1d_data_wrap() -> malloc(), PartitionAlloc hit its TLS bookkeeping at fs_base-0x4018, which is five pages below the TCB page on this x86_64 layout. The loader already knows the exact size of the initial thread's static TLS block at this point, so it does not need a magic page count at all.

Replace the fixed "eight pages below the TCB" rule with the precise static-TLS lower bound derived from dl_tls_static_size and TLS_TCB_SIZE. Round that lower bound down to a page boundary and retag exactly that page range plus the TCB page itself. Keep the page-by-page walk: the initial TLS block can span multiple minimal-malloc VMAs, and a single fixed-size pkey_mprotect over the whole interval can still run into a hole and fail with ENOMEM. Retag the DTV page separately only when it falls outside the computed static-TLS range.

This keeps the policy in the loader that allocated the memory, removes the remaining magic page-count heuristic, and narrows the shared loader-heap surface without reintroducing the tracer startup regression.

Validated with the paired IA2 and dav1d branches:

fresh ./rewrite.py --llvm-config /usr/bin/llvm-config-18 build
dav1d --version returns 0
strict single-thread decode of test.ivf returns 0
Debug/Tracer/No-libc IA2 sweep: ctest --test-dir build/tracer_debug_standard_computed_20260417 \ --output-on-failure -j1 -E terminating_threads => 35/35 passed
Debug/No-tracer/No-libc IA2 sweep: ctest --test-dir build/standard_debug_notracer_computed_20260417 \ --output-on-failure -j1 -E terminating_threads => 35/35 passed
Release/Libc-compartment IA2 sweep: ctest --test-dir build/libc_release_computed_20260417 \ --output-on-failure -j1 => 13/13 passed

IA2's current dav1d single-thread branch no longer wants to retag the initial thread's TLS neighborhood from ia2_start() after startup. That runtime-side retag solved the decode path, but it also caused immediate tracer-mode failures in standard builds because the tracer saw a second pkey_mprotect of loader-owned TLS state during process init. The previous loader-side fix moved that policy into init_tls(), where rtld allocates the initial thread's static TLS block and DTV in the first place. That change kept dav1d working and restored the tracer sweep, but it still used a conservative fixed window: retag the TCB page plus up to eight pages below it, and retag the DTV page separately if it fell outside that range. That fixed-size window is broader than necessary. A narrower 1-page probe regressed strict single-thread decode, and GDB showed exactly why: during ivf_read() -> dav1d_data_wrap() -> malloc(), PartitionAlloc hit its TLS bookkeeping at fs_base-0x4018, which is five pages below the TCB page on this x86_64 layout. The loader already knows the exact size of the initial thread's static TLS block at this point, so it does not need a magic page count at all. Replace the fixed "eight pages below the TCB" rule with the precise static-TLS lower bound derived from dl_tls_static_size and TLS_TCB_SIZE. Round that lower bound down to a page boundary and retag exactly that page range plus the TCB page itself. Keep the page-by-page walk: the initial TLS block can span multiple minimal-malloc VMAs, and a single fixed-size pkey_mprotect over the whole interval can still run into a hole and fail with ENOMEM. Retag the DTV page separately only when it falls outside the computed static-TLS range. This keeps the policy in the loader that allocated the memory, removes the remaining magic page-count heuristic, and narrows the shared loader-heap surface without reintroducing the tracer startup regression. Validated with the paired IA2 and dav1d branches: - fresh ./rewrite.py --llvm-config /usr/bin/llvm-config-18 build - dav1d --version returns 0 - strict single-thread decode of test.ivf returns 0 - Debug/Tracer/No-libc IA2 sweep: ctest --test-dir build/tracer_debug_standard_computed_20260417 \ --output-on-failure -j1 -E terminating_threads => 35/35 passed - Debug/No-tracer/No-libc IA2 sweep: ctest --test-dir build/standard_debug_notracer_computed_20260417 \ --output-on-failure -j1 -E terminating_threads => 35/35 passed - Release/Libc-compartment IA2 sweep: ctest --test-dir build/libc_release_computed_20260417 \ --output-on-failure -j1 => 13/13 passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtld: share exactly the initial thread's static TLS pages for IA2#5

rtld: share exactly the initial thread's static TLS pages for IA2#5
oinoom wants to merge 1 commit intomainfrom
slice/20260417-rtld-initial-tls-sharing

oinoom commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oinoom commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant