Skip to content

Fix crash on connection timeout#1729

Open
rjmunro wants to merge 1 commit into
Sofie-Automation:mainfrom
bbc:rjmunro/fix-crash-on-network-timeout
Open

Fix crash on connection timeout#1729
rjmunro wants to merge 1 commit into
Sofie-Automation:mainfrom
bbc:rjmunro/fix-crash-on-network-timeout

Conversation

@rjmunro
Copy link
Copy Markdown
Contributor

@rjmunro rjmunro commented Apr 27, 2026

About the Contributor

This pull request is posted on behalf of the BBC.

Type of Contribution

This is a Bug fix

Current Behavior

The update to TSR 4eed5ee seems to have changed the behaviour when a timeout occurs so that some sort of exception occurs. Winston in playout gateway sees this exception and exits.

New Behavior

Change Winston config to not exit on errors.

Testing

  • I have added one or more unit tests for this PR
  • I have updated the relevant unit tests
  • No unit test changes are needed for this PR

Affected areas

This affects how playout gateway handles errors.

Time Frame

  • This Bug Fix is critical for us, please review and merge it as soon as possible.

Other Information

This PR is not immediately needed if we prevent TSR passing the errors to Sofie core, as in this PR: Sofie-Automation/sofie-timeline-state-resolver#457 but it might be a good idea to do anyway.

Status

  • PR is ready to be reviewed.
  • The functionality has been tested by the author.
  • Relevant unit tests has been added / updated.
  • Relevant documentation (code comments, system documentation) has been added / updated.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74c34c29-04bb-4b40-b7ca-1b0915034cac

📥 Commits

Reviewing files that changed from the base of the PR and between 053f85a and ce0c3bb.

📒 Files selected for processing (1)
  • packages/playout-gateway/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/playout-gateway/src/index.ts

Walkthrough

Winston logger initialization in the module was updated to set exitOnError: false; no other runtime logic or global event listeners were changed.

Changes

Cohort / File(s) Summary
Logger Configuration
packages/playout-gateway/src/index.ts
Added exitOnError: false to Winston logger initialization. No other functional changes introduced.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix crash on connection timeout' directly addresses the main issue described in the changeset: preventing Winston from crashing the playout gateway on timeout exceptions.
Description check ✅ Passed The description is relevant to the changeset, explaining the root cause (TSR update causing exceptions on timeout), the solution (Winston configuration change), and testing status.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@rjmunro rjmunro marked this pull request as ready for review April 27, 2026 18:53
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/playout-gateway/src/index.ts (1)

89-96: Optional: consider scoping which errors are swallowed, and add observability.

Catching every uncaughtException indefinitely is a known Node.js footgun — the process may be left in an inconsistent state after, e.g., a TypeError deep in a callback, and subsequent failures can be hard to diagnose. The PR description already calls this out as a workaround pending sofie-timeline-state-resolver#457. Two low-cost mitigations worth considering:

  • Emit at a higher severity / include the origin argument from uncaughtException so operators can distinguish "device timeout" noise from genuine bugs:
    process.on('uncaughtException', (e: Error, origin: string) => {
        logger.error(`Unhandled exception (${origin}): ${stringifyError(e)}`)
    })
  • A metric/counter on these handlers would make it easy to alert if the gateway starts silently absorbing a flood of exceptions instead of crash-looping (which is currently the only signal).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/playout-gateway/src/index.ts` around lines 89 - 96, Update the
process error handlers (the process.on('uncaughtException', ...) and
process.on('unhandledRejection', ...) blocks) to record the exception origin and
increase observability: change the uncaughtException handler signature to accept
(err: Error, origin: string) and include origin in the logger.error message
along with stringifyError(err), and add a metric/counter increment (e.g. call a
metrics.increment or similar telemetry function) inside both handlers to track
counts of swallowed uncaught exceptions and unhandled rejections; keep the
current non-crashing behavior but ensure you reference stringifyError and logger
in the new messages so operators can distinguish device/noise errors from real
bugs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/playout-gateway/src/index.ts`:
- Around line 89-96: Remove the explicit process.on listeners for
'uncaughtException' and 'unhandledRejection' (the two handlers that call
logger.error with stringifyError) and rely on Winston's configured transports +
exitOnError: false to handle and log exceptions/rejections; this avoids
duplicate logging from both Winston's internal handlers and the manual
process.on handlers (alternatively, if you prefer keeping manual handlers,
disable handleExceptions/handleRejections on the Winston transports instead).

---

Nitpick comments:
In `@packages/playout-gateway/src/index.ts`:
- Around line 89-96: Update the process error handlers (the
process.on('uncaughtException', ...) and process.on('unhandledRejection', ...)
blocks) to record the exception origin and increase observability: change the
uncaughtException handler signature to accept (err: Error, origin: string) and
include origin in the logger.error message along with stringifyError(err), and
add a metric/counter increment (e.g. call a metrics.increment or similar
telemetry function) inside both handlers to track counts of swallowed uncaught
exceptions and unhandled rejections; keep the current non-crashing behavior but
ensure you reference stringifyError and logger in the new messages so operators
can distinguish device/noise errors from real bugs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78d36f7d-409f-4189-99b1-0f59477675b3

📥 Commits

Reviewing files that changed from the base of the PR and between bb2a27d and 053f85a.

📒 Files selected for processing (1)
  • packages/playout-gateway/src/index.ts

Comment thread packages/playout-gateway/src/index.ts Outdated
Comment on lines +89 to +96
// Log uncaught exceptions and unhandled rejections without crashing the process.
// Device connection errors (e.g. ETIMEDOUT) must not bring down the gateway.
process.on('uncaughtException', (e: Error) => {
logger.error(`Unhandled exception: ${stringifyError(e)}`)
})
process.on('unhandledRejection', (reason: unknown) => {
logger.error(`Unhandled promise rejection: ${stringifyError(reason)}`)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Likely duplicate logging — Winston already handles these.

Both transports above are configured with handleExceptions: true and handleRejections: true (lines 35–41 and 65–66). Winston's exception/rejection handlers register their own process.on('uncaughtException') / process.on('unhandledRejection') listeners when those transports are attached, so each uncaught exception/rejection will now be logged twice — once by Winston's internal handler and once by the explicit handler added here.

Pick one approach. The smaller change (and consistent with packages/mos-gateway/src/index.ts, which only relies on Winston) is to drop the explicit listeners and let exitOnError: false keep the process alive:

♻️ Option A — rely on Winston only (matches mos-gateway)
-// Log uncaught exceptions and unhandled rejections without crashing the process.
-// Device connection errors (e.g. ETIMEDOUT) must not bring down the gateway.
-process.on('uncaughtException', (e: Error) => {
-	logger.error(`Unhandled exception: ${stringifyError(e)}`)
-})
-process.on('unhandledRejection', (reason: unknown) => {
-	logger.error(`Unhandled promise rejection: ${stringifyError(reason)}`)
-})
♻️ Option B — keep these handlers, disable Winston's built-in handling
 	const transportConsole = new Winston.transports.Console({
 		level: logLevel || 'silly',
-		handleExceptions: true,
-		handleRejections: true,
 	})
 	const transportFile = new Winston.transports.File({
 		level: logLevel || 'silly',
-		handleExceptions: true,
-		handleRejections: true,
 		filename: logPath,
 		format: myFormat,
 	})

…and the same for the console-only transport in the else branch.

Please confirm Winston 3.19.0's behavior here (does attaching a transport with handleExceptions: true install its own process.on('uncaughtException') listener?):

winston 3.19 handleExceptions handleRejections process.on uncaughtException listener
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/playout-gateway/src/index.ts` around lines 89 - 96, Remove the
explicit process.on listeners for 'uncaughtException' and 'unhandledRejection'
(the two handlers that call logger.error with stringifyError) and rely on
Winston's configured transports + exitOnError: false to handle and log
exceptions/rejections; this avoids duplicate logging from both Winston's
internal handlers and the manual process.on handlers (alternatively, if you
prefer keeping manual handlers, disable handleExceptions/handleRejections on the
Winston transports instead).

The update to TSR 4eed5ee seems to have
changed the behaviour when a timeout occurs so that some sort of
exception occurs. Winston in playout gateway sees this exception and
exits.

This configures Winston to not exit in those circumstances.
@rjmunro rjmunro force-pushed the rjmunro/fix-crash-on-network-timeout branch from 053f85a to ce0c3bb Compare April 28, 2026 11:06
@jstarpl jstarpl added the Contribution from BBC Contributions sponsored by BBC (bbc.co.uk) label Apr 28, 2026
@rjmunro rjmunro requested review from Julusian and nytamin May 6, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Contribution from BBC Contributions sponsored by BBC (bbc.co.uk)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants