fix: deadlock in DefaultServerDispatcher by smulloni · Pull Request #412 · lorenzodonini/ocpp-go

smulloni · 2026-03-23T14:44:52Z

Proposed changes

This fixes a deadlock in DefaultServerDispatcher when many requests timeout concurrently, matching the symptoms reported in #358.

DefaultServerDispatcher shares its RWMutex with ServerState (via NewServerState(&d.mutex)). When more than 10 client request timeouts fire simultaneously, waitForTimeout goroutines acquire RLock and block sending to timerC (buffer=10). Meanwhile messagePump, the sole timerC reader, calls HasPendingRequest which needs Lock on the same mutex — forming a circular wait that freezes the entire server permanently.

Three fixes applied:

Give ServerState its own RWMutex instead of sharing the dispatcher's. This breaks the cross-concern locking dependency.

In waitForTimeout, release RLock before sending to timerC. Previously the lock was held via defer across the blocking send.

In messagePump's timeout handler, inline the request completion instead of calling CompleteRequest, which sends to readyForDispatch (buffer=1). Since messagePump is the sole reader of that channel, this was a self-deadlock path when processing consecutive timeouts.

Additionally, read-only ServerState/ClientState methods (HasPendingRequest, GetPendingRequest, DeletePendingRequest, HasPendingRequests) now use RLock instead of Lock, and GetClientState uses a read-then-upgrade pattern, reducing contention under load.

Includes a regression test that reliably triggers the deadlock on unfixed code (15 clients, 50ms timeout, expects all callbacks to fire within 2s — previously only 1/15 would fire before freezing).

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of
them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before
merging your code.

I have read the CONTRIBUTING doc
I have signed the CLA (glad to, where do I do this?)
Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

Further comments

There are several PRs partially addressing this issue, but as far as we can tell this is comprehensive (and explains why the problem can be mitigated through horizontal scaling). It is currently running in production for us at GreenJoulez.

…oncurrently DefaultServerDispatcher shared its RWMutex with ServerState (via NewServerState(&d.mutex)). When more than 10 client request timeouts fired simultaneously, waitForTimeout goroutines would acquire RLock and block sending to timerC (buffer=10). Meanwhile messagePump, the sole timerC reader, would call HasPendingRequest which needed Lock on the same mutex — forming a circular wait that froze the entire server permanently. This matches the symptoms reported in lorenzodonini#358. Three fixes applied: 1. Give ServerState its own RWMutex instead of sharing the dispatcher's. This breaks the cross-concern locking dependency. 2. In waitForTimeout, release RLock before sending to timerC. Previously the lock was held via defer across the blocking send. 3. In messagePump's timeout handler, inline the request completion instead of calling CompleteRequest, which sends to readyForDispatch (buffer=1). Since messagePump is the sole reader of that channel, this was a self-deadlock path when processing consecutive timeouts. Additionally, read-only ServerState/ClientState methods (HasPendingRequest, GetPendingRequest, DeletePendingRequest, HasPendingRequests) now use RLock instead of Lock, and GetClientState uses a read-then-upgrade pattern, reducing contention under load. Includes a regression test that reliably triggers the deadlock on unfixed code (15 clients, 50ms timeout, expects all callbacks to fire within 2s — previously only 1/15 would fire before freezing).

fix: deadlock in DefaultServerDispatcher

smulloni added 2 commits March 12, 2026 16:04

Merge pull request #1 from smulloni/deadlockfix

1680fa2

fix: deadlock in DefaultServerDispatcher

smulloni requested a review from lorenzodonini as a code owner March 23, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deadlock in DefaultServerDispatcher#412

fix: deadlock in DefaultServerDispatcher#412
smulloni wants to merge 2 commits intolorenzodonini:masterfrom
GreenJoulez:master

smulloni commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smulloni commented Mar 23, 2026

Proposed changes

Types of changes

Checklist

Further comments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant