Make LS checkers time out after being idle, segment based on use (take 2)#3435
Make LS checkers time out after being idle, segment based on use (take 2)#3435jakebailey wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Reintroduces an idle-expiring checker pool for the language server, with separate checker “lifetimes” (diagnostics vs temporary queries vs API) to reduce contention, improve cache locality, and preserve stable identity for API handles. This fits into the project system’s program lifecycle management and the LSP/API entrypoints that acquire type checkers.
Changes:
- Add
core.CheckerLifetimeoncontext.Contextand route checker acquisition by lifetime (Temporary/Diagnostics/API). - Replace the project checker pool implementation with a semaphore + idle-timeout cleanup design, plus
Discard()to stop background cleanup when programs are superseded/unreferenced. - Add extensive
checkerpooltests usingtesting/synctest, and plumb checker pool options throughSessionOptions.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/project/snapshot.go | Discards checker pools when a program becomes unreferenced (to stop background cleanup). |
| internal/project/session.go | Adds CheckerPoolOptions to session initialization options. |
| internal/project/projectcollectionbuilder.go | Discards old pools on program updates; rebinds project.checkerPool from Program. |
| internal/project/project.go | Uses SessionOptions.CheckerPoolOptions when creating programs/checker pools; simplifies CreateProgramResult. |
| internal/project/checkerpool.go | New checker pool implementation: lifetimes, semaphores, idle cleanup timer, Discard(). |
| internal/project/checkerpool_test.go | New comprehensive tests for routing, affinity, contention, cleanup, discard behavior. |
| internal/lsp/server.go | Marks document diagnostic requests as CheckerLifetimeDiagnostics. |
| internal/core/context.go | Adds checker lifetime context plumbing. |
| internal/compiler/program.go | Exposes Program.GetCheckerPool() for project-side introspection. |
| internal/api/session.go | Routes API type-checker usage to CheckerLifetimeAPI; tags API diagnostics as diagnostics lifetime. |
| // cleanupIdleCheckers disposes checkers that have been idle for longer than | ||
| // the idle timeout. The API checker is separate and never idle-cleaned. | ||
| func (p *checkerPool) cleanupIdleCheckers() { | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| now := time.Now() | ||
| for i := range p.checkers { | ||
| c := p.checkers[i] | ||
| if c == nil || p.heldBy[i] != "" { | ||
| continue | ||
| } | ||
| if p.lastReleased[i].IsZero() { | ||
| continue | ||
| } | ||
| idle := now.Sub(p.lastReleased[i]) | ||
| if idle >= p.opts.IdleTimeout { | ||
| p.log(fmt.Sprintf("checkerpool: Disposing idle checker %d (idle %v)", i, idle)) | ||
| p.disposeCheckerLocked(i, c) | ||
| } | ||
| } | ||
| // Reschedule for any remaining idle-but-not-yet-expired checkers. | ||
| // scheduleCleanupLocked will Reset the existing timer rather than | ||
| // creating a new one, avoiding goroutine leaks. | ||
| p.scheduleCleanupLocked() | ||
| } |
There was a problem hiding this comment.
cleanupIdleCheckers runs even after Discard() and unconditionally calls scheduleCleanupLocked(). Because time.Timer.Stop() doesn’t guarantee the callback won’t run concurrently, the cleanup callback can still fire after Discard() and re-arm the timer / dispose checkers on a discarded pool. Add a p.discarded guard at the start of cleanupIdleCheckers (and/or inside scheduleCleanupLocked) to no-op and avoid rescheduling when discarded.
| // Discard signals that this pool's program has been replaced. The pool | ||
| // remains functional but stops its idle-cleanup timer so that checkers | ||
| // are not disposed until the pool is GC'd. The API checker is unaffected | ||
| // since it is never idle-cleaned. | ||
| func (p *checkerPool) Discard() { | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| if p.discarded { | ||
| return // already discarded | ||
| } | ||
| return true | ||
| } | ||
|
|
||
| func (p *CheckerPool) createCheckerLocked() (*checker.Checker, int) { | ||
| for i, existing := range p.checkers { | ||
| if existing == nil { | ||
| checker, _ := checker.NewChecker(p.program) | ||
| p.checkers[i] = checker | ||
| return checker, i | ||
| } | ||
| p.log("checkerpool: Discarding pool, stopping idle cleanup") | ||
| p.discarded = true | ||
| if p.cleanupTimer != nil { | ||
| p.cleanupTimer.Stop() | ||
| p.cleanupTimer = nil | ||
| } |
There was a problem hiding this comment.
Discard() stops and nils cleanupTimer, but the timer’s callback may already be running or may run immediately after Stop() returns false. Without an additional discarded check in the callback, the pool can end up keeping itself alive (via a newly scheduled timer) even though it was discarded. Consider setting discarded=true before stopping the timer and ensuring the callback clears cleanupTimer and does not reschedule when discarded.
| func (p *checkerPool) getPersistentChecker() (*checker.Checker, func()) { | ||
| p.persistentSem <- struct{}{} | ||
| p.mu.Lock() | ||
|
|
||
| if p.persistentChecker == nil { | ||
| p.log("checkerpool: Creating persistent checker") | ||
| c, _ := checker.NewChecker(p.program) | ||
| p.persistentChecker = c | ||
| } | ||
| return checker, index | ||
|
|
||
| c := p.persistentChecker | ||
| p.persistentHeld = true | ||
| p.mu.Unlock() | ||
|
|
||
| return c, sync.OnceFunc(func() { | ||
| p.mu.Lock() | ||
| p.persistentHeld = false | ||
| p.mu.Unlock() | ||
| <-p.persistentSem | ||
| }) | ||
| } |
There was a problem hiding this comment.
The persistent/API checker release path doesn’t dispose a checker that has been canceled. checker.Checker panics on reuse once WasCanceled() is true (see checkNotCanceled()), so the next API acquisition can crash the server. Mirror the normal release behavior here: if the persistent checker WasCanceled(), clear it (so the next request recreates it) rather than keeping the canceled instance.
| // registerRequestCleanup uses context.AfterFunc to delete the request | ||
| // association when the request context is done. This prevents the map | ||
| // from growing unboundedly with completed request IDs. | ||
| // Must be called with p.mu held; the cleanup runs asynchronously. | ||
| func (p *checkerPool) registerRequestCleanup(ctx context.Context, requestID string) { | ||
| context.AfterFunc(ctx, func() { | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| delete(p.requestAssociations, requestID) | ||
| }) |
There was a problem hiding this comment.
registerRequestCleanup only deletes requestAssociations when ctx is done. If a caller uses WithRequestID on a context that never cancels (e.g. context.Background() in some tests/utilities), requestAssociations can grow without bound for the life of the pool. Consider skipping request affinity when ctx.Done() == nil, or adding a fallback cleanup strategy (e.g. delete on release when Done is nil, or enforce a max size/TTL).
| // Mark its checker pool as discarded so idle checkers are disposed | ||
| // immediately rather than waiting for its idle timer. |
There was a problem hiding this comment.
The comment says Discard() will make idle checkers be disposed immediately, but checkerPool.Discard() actually stops the idle-cleanup timer and keeps checkers alive until the pool is GC’d. Please update this comment to reflect the actual behavior (stopping the timer to avoid keeping the pool alive / to allow GC).
| // Mark its checker pool as discarded so idle checkers are disposed | |
| // immediately rather than waiting for its idle timer. | |
| // Mark its checker pool as discarded so its idle-cleanup timer stops | |
| // keeping the pool alive, allowing the pool and any idle checkers it | |
| // still references to be reclaimed when the pool is garbage-collected. |
This brings back #3363, which I think should be safe after #3429?