Skip to content

fix: prevent pipeline errors from killing Reactor subscription#205

Merged
usmansaleem merged 4 commits intoConsensys:masterfrom
usmansaleem:fix/handshake-error-recovery
Apr 1, 2026
Merged

fix: prevent pipeline errors from killing Reactor subscription#205
usmansaleem merged 4 commits intoConsensys:masterfrom
usmansaleem:fix/handshake-error-recovery

Conversation

@usmansaleem
Copy link
Copy Markdown
Contributor

@usmansaleem usmansaleem commented Mar 26, 2026

Summary

  • Wrap each pipeline handler invocation in a try-catch so errors are contained at the handler level and never propagate to the Reactor stream
  • Add catch(Throwable) to HandshakeMessagePacketHandler for defense-in-depth, matching the existing pattern in MessagePacketHandler

Root Cause

PipelineImpl chains handlers using doOnNext, but Reactor's onErrorContinue does not reliably work with doOnNext (a known Reactor limitation — it only supports operators like map/flatMap). When a StackOverflowError escaped a handler, it could intermittently terminate the Reactor subscription, causing the node to stop processing all subsequent incoming packets.

This caused the flaky timeout in DiscoveryIntegrationTest.shouldRecoverAfterErrorWhileDecodingInboundMessage().

Test plan

  • shouldRecoverAfterErrorWhileDecodingInboundMessage passes
  • Full test suite passes

Note

Medium Risk
Changes core packet processing flow by swallowing all Throwable from pipeline handlers; this reduces outages but could mask bugs and alter failure semantics/logging. Also adjusts handshake error handling to drop sessions on unexpected JVM errors.

Overview
Prevents individual EnvelopeHandler failures from terminating the Reactor pipeline by wrapping each doOnNext handler invocation in a try/catch (Throwable) and logging a warning instead of letting errors propagate.

Adds defense-in-depth in HandshakeMessagePacketHandler by catching unexpected Throwable during handshake processing, logging at warn, and marking the handshake/session as failed to ensure cleanup continues even on non-Exception errors.

Written by Cursor Bugbot for commit bc34525. This will update automatically on new commits. Configure here.

The pipeline used doOnNext to chain handlers, but onErrorContinue does
not work reliably with doOnNext (a known Reactor limitation). When a
StackOverflowError escaped a handler, it could intermittently terminate
the Reactor subscription, causing the node to stop processing all
subsequent incoming packets.

Wrap each handler invocation in a try-catch within the pipeline so errors
are contained at the handler level and never propagate to Reactor.

Also add catch(Throwable) to HandshakeMessagePacketHandler for
defense-in-depth, matching the existing pattern in MessagePacketHandler.
Comment thread src/main/java/org/ethereum/beacon/discovery/pipeline/PipelineImpl.java Outdated
Copy link
Copy Markdown
Contributor

@gfukushima gfukushima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

@usmansaleem usmansaleem merged commit 72315ae into Consensys:master Apr 1, 2026
5 checks passed
@usmansaleem usmansaleem deleted the fix/handshake-error-recovery branch April 1, 2026 00:44
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants