Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions druntime/src/core/internal/gc/impl/conservative/gc.d
Original file line number Diff line number Diff line change
Expand Up @@ -3621,6 +3621,8 @@ Lmark:
else
pullLoop!(false)();

evStackFilled.reset(); // symmetric with setIfInitialized() above; avoids livelock when no pop ever happened

debug(PARALLEL_PRINTF) printf("waitForScanDone done\n");
}

Expand Down
3 changes: 2 additions & 1 deletion druntime/test/gc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ TESTS:=attributes sentinel printf memstomp invariant logging \

ifneq ($(OS),windows)
# some .d files are for Posix only
TESTS+=sentinel1 sentinel2 forkgc sigmaskgc startbackgc
TESTS+=sentinel1 sentinel2 forkgc sigmaskgc startbackgc issue23081
# and some tests require the `fork` GC, only supported on Posix
TESTS+=concurrent precise_concurrent hospital
endif
Expand Down Expand Up @@ -93,3 +93,4 @@ $(ROOT)/hospital$(DOTEXE): extra_dflags += -d
$(ROOT)/hospital.done: run_args+=--DRT-gcopt=fork:1
$(ROOT)/issue22843$(DOTEXE): extra_dflags += $(core_ut)
$(ROOT)/issue22843.done: run_args+="--DRT-gcopt=fork:1 initReserve:0 minPoolSize:1"
$(ROOT)/issue23081.done: run_args+="--DRT-gcopt=parallel:128 minPoolSize:1"
55 changes: 55 additions & 0 deletions druntime/test/gc/issue23081.d
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
// https://github.com/dlang/dmd/issues/23081
// GC scan threads livelock when no global stack pop occurs during markParallel.
//
// evStackFilled is a manual-reset event. markParallel sets it unconditionally
// before the pull loop. If toscanRoots is small enough that pointersPerThread==0
// (nothing pre-pushed to the global scan stack) AND the live structure fits in
// mark()'s 32-entry local stack (nothing spills), the event is never reset.
// Background scan threads then spin forever: evStackFilled.wait() returns
// immediately, pullFromScanStack is a no-op, evDone is broadcast, repeat.
//
// The fix resets evStackFilled after pullLoop returns, symmetrically with the
// setIfInitialized() call above it.
//
// We use parallel:128 (capped to logical_cpus - 1 by startScanThreads) to push
// numScanThreads as high as the host allows. The bug only triggers when
// numScanThreads + 1 exceeds toscanRoots.length, so on few-core CI runners this
// test passes without exercising the regression path; on many-core machines
// (>= ~20 cores) it reliably triggers without the fix.
Comment on lines +14 to +18
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Are we OK with adding a test that only catches the bug on certain hardware configurations?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with a test that would fail eventually without the fix, but it should not falsely fail with the fix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This measures CPU utilization so I guess there's also a risk of a spurious false positive.

//
// NOTE: GC.collect() calls fullcollect(isFinal=true) which disables parallel
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have crept in with https://github.com/dlang/dmd/pull/16401/changes#diff-cb1050c2f89057445a525aeaabe2c1864a24ddcd5f8caa9de24074dbab81aca3L3019 where the call from GC.collect was not adapted. Should be fixed by another PR.

// scan threads. To trigger parallel marking we rely on automatic collections
// driven by allocations filling the initial pool.

import core.thread;
import core.stdc.stdio;
import core.sys.posix.sys.resource;
import core.time;

void main()
{
// Allocate enough to fill the first GC pool (minPoolSize:1 = 1MB) so the
// allocator triggers an automatic collection (isFinal=false path) rather
// than growing the heap. Keep each object tiny so the live set is small.
foreach (i; 0 .. 8192)
new int[32]; // 8192 * 128 bytes = 1MB of small allocations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check whether a collection happened at all via GC.stats?


// Measure CPU consumption during a quiescent sleep.
// Without the fix, the scan threads from the last automatic collection
// are still spinning because evStackFilled was never reset.
rusage before, after;
getrusage(RUSAGE_SELF, &before);
Thread.sleep(500.msecs);
getrusage(RUSAGE_SELF, &after);

long userUs = (after.ru_utime.tv_sec - before.ru_utime.tv_sec) * 1_000_000L
+ (after.ru_utime.tv_usec - before.ru_utime.tv_usec);
long sysUs = (after.ru_stime.tv_sec - before.ru_stime.tv_sec) * 1_000_000L
+ (after.ru_stime.tv_usec - before.ru_stime.tv_usec);
long totalCpuUs = userUs + sysUs;
long wallUs = 500_000L;

double ratio = cast(double)totalCpuUs / wallUs;
printf("CPU/wall ratio during quiescent sleep: %.2f\n", ratio);
assert(ratio < 2.0, "GC scan threads are spinning -- livelock detected (#23081)");
}
Loading