fix(hub): stop launching system updaters after manager shutdown#1951
Open
BootstrapperSBL wants to merge 1 commit intohenrygd:mainfrom
Open
fix(hub): stop launching system updaters after manager shutdown#1951BootstrapperSBL wants to merge 1 commit intohenrygd:mainfrom
BootstrapperSBL wants to merge 1 commit intohenrygd:mainfrom
Conversation
…ygd#1950) The staggered Initialize goroutine kept calling AddSystem after the hub had been torn down, which spawned StartUpdater goroutines that later dereferenced a nil DB inside PocketBase. Track a manager-level context that RemoveAllSystems cancels, abort the staggered loop on shutdown, and refuse AddSystem once the manager is stopped.
Contributor
Author
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #1950.
The hub test suite intermittently panics on Apple Silicon during
go test ./internal/alerts/...with a nil pointer dereference inside PocketBase'sRecordQuery. The trace ends inStartUpdater -> setDown -> getRecord -> FindRecordById, with the goroutine created bySystemManager.AddSystem.Root cause
SystemManager.Initializespawns a background goroutine that walks the systems loaded from the database and callsAddSystemfor each one with atime.Sleepbetween calls (up to 2s per system). When a test'sdefer hub.Cleanup()runs while that loop is still sleeping, the loop is not aware of the shutdown:RemoveAllSystemsiteratessm.systems.GetAll()and cancels every system already added.TestApp.CleanupcallsResetBootstrapState, nilling the underlying DB.AddSystemfor the next system, and starts aStartUpdatergoroutine whosesys.manager.hubalready has a nil DB. The firstupdate()call inevitably fails (no real agent),setDown -> getRecord -> FindRecordByIdthen panics insideapp.ConcurrentDB().The window is much wider on Apple Silicon because of weaker memory ordering and slower TCP teardown, which is why x86_64 builds usually pass.
Fix
Give
SystemManagerits owncontext.Contextthat is cancelled on shutdown:time.Sleep(sleepTime)with aselectonsm.ctx.Done()and returns early once shutdown is signalled.AddSystemrefuses to add a system (and therefore does not spawnStartUpdater) oncesm.ctxis cancelled.RemoveAllSystemshelper cancelssm.ctxbefore tearing systems down, so by the timeTestApp.Cleanupclears the DB the staggered loop is guaranteed not to start any new updaters.No
recover()is added; the production paths are unchanged on the happy path.Before / After
Before (from the issue, also reproducible locally on darwin/arm64):
After:
go test ./internal/alerts/... ./internal/hub/systems/...is green across 5 consecutive runs on darwin/arm64.Changelog
Fixed
StartUpdatergoroutines against a hub that is being torn down, fixing a flaky panic ininternal/alertstests on Apple Silicon ([Bug]: Panic / Nil pointer dereference ininternal/hub/systems/system.goduring tests #1950).