rtld: share exactly the initial thread's static TLS pages for IA2#5
Draft
rtld: share exactly the initial thread's static TLS pages for IA2#5
Conversation
IA2's current dav1d single-thread branch no longer wants to retag the
initial thread's TLS neighborhood from ia2_start() after startup. That
runtime-side retag solved the decode path, but it also caused immediate
tracer-mode failures in standard builds because the tracer saw a second
pkey_mprotect of loader-owned TLS state during process init.
The previous loader-side fix moved that policy into init_tls(), where
rtld allocates the initial thread's static TLS block and DTV in the
first place. That change kept dav1d working and restored the tracer
sweep, but it still used a conservative fixed window: retag the TCB
page plus up to eight pages below it, and retag the DTV page separately
if it fell outside that range.
That fixed-size window is broader than necessary. A narrower 1-page
probe regressed strict single-thread decode, and GDB showed exactly why:
during ivf_read() -> dav1d_data_wrap() -> malloc(), PartitionAlloc hit
its TLS bookkeeping at fs_base-0x4018, which is five pages below the
TCB page on this x86_64 layout. The loader already knows the exact size
of the initial thread's static TLS block at this point, so it does not
need a magic page count at all.
Replace the fixed "eight pages below the TCB" rule with the precise
static-TLS lower bound derived from dl_tls_static_size and TLS_TCB_SIZE.
Round that lower bound down to a page boundary and retag exactly that
page range plus the TCB page itself. Keep the page-by-page walk: the
initial TLS block can span multiple minimal-malloc VMAs, and a single
fixed-size pkey_mprotect over the whole interval can still run into a
hole and fail with ENOMEM. Retag the DTV page separately only when it
falls outside the computed static-TLS range.
This keeps the policy in the loader that allocated the memory, removes
the remaining magic page-count heuristic, and narrows the shared
loader-heap surface without reintroducing the tracer startup regression.
Validated with the paired IA2 and dav1d branches:
- fresh ./rewrite.py --llvm-config /usr/bin/llvm-config-18 build
- dav1d --version returns 0
- strict single-thread decode of test.ivf returns 0
- Debug/Tracer/No-libc IA2 sweep:
ctest --test-dir build/tracer_debug_standard_computed_20260417 \
--output-on-failure -j1 -E terminating_threads
=> 35/35 passed
- Debug/No-tracer/No-libc IA2 sweep:
ctest --test-dir build/standard_debug_notracer_computed_20260417 \
--output-on-failure -j1 -E terminating_threads
=> 35/35 passed
- Release/Libc-compartment IA2 sweep:
ctest --test-dir build/libc_release_computed_20260417 \
--output-on-failure -j1
=> 13/13 passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
IA2's current dav1d single-thread branch no longer wants to retag the initial thread's TLS neighborhood from ia2_start() after startup. That runtime-side retag solved the decode path, but it also caused immediate tracer-mode failures in standard builds because the tracer saw a second pkey_mprotect of loader-owned TLS state during process init.
The previous loader-side fix moved that policy into init_tls(), where rtld allocates the initial thread's static TLS block and DTV in the first place. That change kept dav1d working and restored the tracer sweep, but it still used a conservative fixed window: retag the TCB page plus up to eight pages below it, and retag the DTV page separately if it fell outside that range.
That fixed-size window is broader than necessary. A narrower 1-page probe regressed strict single-thread decode, and GDB showed exactly why: during ivf_read() -> dav1d_data_wrap() -> malloc(), PartitionAlloc hit its TLS bookkeeping at fs_base-0x4018, which is five pages below the TCB page on this x86_64 layout. The loader already knows the exact size of the initial thread's static TLS block at this point, so it does not need a magic page count at all.
Replace the fixed "eight pages below the TCB" rule with the precise static-TLS lower bound derived from dl_tls_static_size and TLS_TCB_SIZE. Round that lower bound down to a page boundary and retag exactly that page range plus the TCB page itself. Keep the page-by-page walk: the initial TLS block can span multiple minimal-malloc VMAs, and a single fixed-size pkey_mprotect over the whole interval can still run into a hole and fail with ENOMEM. Retag the DTV page separately only when it falls outside the computed static-TLS range.
This keeps the policy in the loader that allocated the memory, removes the remaining magic page-count heuristic, and narrows the shared loader-heap surface without reintroducing the tracer startup regression.
Validated with the paired IA2 and dav1d branches: