Skip to content

fix: /healthz and /readyz wait for initial device connection (SOFIE-4325)#1730

Open
jstarpl wants to merge 2 commits into
Sofie-Automation:mainfrom
nrkno:contrib/mos-gw-init
Open

fix: /healthz and /readyz wait for initial device connection (SOFIE-4325)#1730
jstarpl wants to merge 2 commits into
Sofie-Automation:mainfrom
nrkno:contrib/mos-gw-init

Conversation

@jstarpl
Copy link
Copy Markdown
Contributor

@jstarpl jstarpl commented Apr 27, 2026

About the Contributor

This pull request is posted on behalf of the NRK.

Type of Contribution

This is a:

Bug fix

Current Behavior

The Kubernetes /healthz and /readyz don't return an OK signal until devices configured at boot-up have connected.

New Behavior

The device connection is detached from boot-up, since it's not really necessary for the boot-up process. This allows the instance to stay up instead of being restarted.

Testing

  • I have added one or more unit tests for this PR
  • I have updated the relevant unit tests
  • No unit test changes are needed for this PR

Affected areas

This PR affects the MOS Gateway bootup process.

Time Frame

Not urgent, but we would like to get this merged into the in-development release.

Other Information

Status

  • PR is ready to be reviewed.
  • The functionality has been tested by the author.
  • Relevant unit tests has been added / updated.
  • Relevant documentation (code comments, system documentation) has been added / updated.

@jstarpl jstarpl added 🐛🔨 bugfix This is a fix for an issue Contribution from NRK Contributions sponsored by NRK (nrk.no) labels Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Walkthrough

MosHandler.init no longer awaits _updateDevices() during startup; it registers coreHandler.onConnected and schedules a debounced device refresh via a new triggerUpdateDevices() helper. _coreHandler is changed to a definite field and timer variable casing and management are consolidated.

Changes

Cohort / File(s) Summary
Initialization & debounce refactor
packages/mos-gateway/src/mosHandler.ts
Removed awaiting _updateDevices() in init; register coreHandler.onConnected instead. Added triggerUpdateDevices() to centralize debounce/timeout handling and renamed _triggerupdateDevicesTimeout_triggerUpdateDevicesTimeout. Changed _coreHandler from optional to definite (CoreHandler!) and removed related runtime guard. Moved timer clear/reset logic into the new helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title specifically references the fix to /healthz and /readyz endpoints and their waiting behavior, which directly aligns with the PR's main objective of detaching device connection from boot-up.
Description check ✅ Passed The description comprehensively explains the current behavior, new behavior, affected areas, and testing status, all of which relate directly to the changeset in mosHandler.ts.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/mos-gateway/src/mosHandler.ts 0.00% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/mos-gateway/src/mosHandler.ts (2)

83-83: Non-null assertion makes existing _coreHandler undefined-checks dead code.

With _coreHandler!: CoreHandler the field can no longer be typed undefined, so the if (!this._coreHandler) throw … branches at lines 156-158, 294, 453-455, and 599 are unreachable per the type system. They're harmless, but worth removing in a follow-up to keep the runtime invariant aligned with the type. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mos-gateway/src/mosHandler.ts` at line 83, The field _coreHandler is
declared with a non-null assertion but code contains runtime undefined checks,
so make the type reflect that by changing the declaration from "_coreHandler!:
CoreHandler" to "private _coreHandler?: CoreHandler" so existing guards (the if
(!this._coreHandler) throw ...) remain meaningful; update any call sites that
currently rely on non-null typing to either keep the guards or use explicit
non-null assertions where you truly know it is initialized (e.g., inside methods
that run after initialization) so the type and runtime behavior stay aligned.

209-218: Clear _triggerUpdateDevicesTimeout on dispose().

The deferred update is now scheduled via setTimeout and stored on the instance, but dispose() (lines 141-148) never clears it. The _disposed check inside _updateDevices() does prevent work from happening, but the pending timer still keeps the event loop alive and holds a closure reference to this. Cheap to fix:

♻️ Proposed cleanup
 	async dispose(): Promise<void> {
 		this._disposed = true
+		if (this._triggerUpdateDevicesTimeout) {
+			clearTimeout(this._triggerUpdateDevicesTimeout)
+			this._triggerUpdateDevicesTimeout = null
+		}
 		if (this.mos) {
 			return this.mos.dispose()
 		} else {
 			return Promise.resolve()
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mos-gateway/src/mosHandler.ts` around lines 209 - 218, The instance
schedules a deferred timer in triggerUpdateDevices (stores it in
_triggerUpdateDevicesTimeout) but dispose() does not clear it, leaving the timer
alive and holding a reference to this; update dispose() to
clearTimeout(this._triggerUpdateDevicesTimeout) and then null/undefined the
property (or otherwise reset it) so any pending timer is cancelled, and ensure
triggerUpdateDevices still overwrites the field when scheduling a new timer;
reference functions/fields: triggerUpdateDevices, _triggerUpdateDevicesTimeout,
dispose, and _updateDevices.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/mos-gateway/src/mosHandler.ts`:
- Around line 132-139: The ordering regression causes setupObservers() to run
before _initMosConnection() and can NPE when _deviceOptionsChanged() calls
this.mos.setDebug; fix by guarding _deviceOptionsChanged() to tolerate a missing
mos (e.g. early-return or defer calling this.mos.setDebug if this.mos is
undefined) or by moving setupObservers() to run only after
_initMosConnection()/_updateDevices() has initialized this.mos; update the code
paths in setupObservers, triggerUpdateDevices, _deviceOptionsChanged,
_updateDevices and _initMosConnection so observers never call this.mos methods
when this.mos is undefined (prefer adding a simple if (!this.mos) return or
queue the change until after _initMosConnection completes).

---

Nitpick comments:
In `@packages/mos-gateway/src/mosHandler.ts`:
- Line 83: The field _coreHandler is declared with a non-null assertion but code
contains runtime undefined checks, so make the type reflect that by changing the
declaration from "_coreHandler!: CoreHandler" to "private _coreHandler?:
CoreHandler" so existing guards (the if (!this._coreHandler) throw ...) remain
meaningful; update any call sites that currently rely on non-null typing to
either keep the guards or use explicit non-null assertions where you truly know
it is initialized (e.g., inside methods that run after initialization) so the
type and runtime behavior stay aligned.
- Around line 209-218: The instance schedules a deferred timer in
triggerUpdateDevices (stores it in _triggerUpdateDevicesTimeout) but dispose()
does not clear it, leaving the timer alive and holding a reference to this;
update dispose() to clearTimeout(this._triggerUpdateDevicesTimeout) and then
null/undefined the property (or otherwise reset it) so any pending timer is
cancelled, and ensure triggerUpdateDevices still overwrites the field when
scheduling a new timer; reference functions/fields: triggerUpdateDevices,
_triggerUpdateDevicesTimeout, dispose, and _updateDevices.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ad57a02-ca62-4d6c-bc26-bb7fb90c58f3

📥 Commits

Reviewing files that changed from the base of the PR and between bb2a27d and d126baa.

📒 Files selected for processing (1)
  • packages/mos-gateway/src/mosHandler.ts

Comment on lines +132 to +139
coreHandler.onConnected(() => {
// This is called whenever a connection to Core has been (re-)established
this.setupObservers()
this.sendStatusOfAllMosDevices()
})
this.setupObservers()

this.triggerUpdateDevices()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ordering regression: setupObservers() now runs before _initMosConnection() and can NPE on this.mos.

Previously init() awaited _updateDevices() (which awaits _initMosConnection()), so this.mos was always defined before any observer/_deviceOptionsChanged could run. After this change, the call order in init() is:

  1. setupObservers() (line 137) — synchronously invokes _deviceOptionsChanged() at line 172.
  2. triggerUpdateDevices() (line 139) — schedules _updateDevices() (and therefore _initMosConnection()) ~20 ms later.

If the peripheral device's deviceSettings.debugLogging differs from the default this.debugLogging = false, _deviceOptionsChanged() enters the branch at line 186 and reaches line 191:

if (!this.mos) {
    throw Error('mos is undefined!')
}
this.mos.setDebug(this.debugLogging)

this.mos is still undefined here, so init() will throw and MosHandler.create() will reject — which is exactly the boot-failure mode this PR is trying to remove for /healthz and /readyz. The same race exists for any peripheralDevice observer event that fires inside the 20 ms window before _updateDevices() initializes this.mos.

Either initialize the MOS connection eagerly (without awaiting device connections) before observers run, or guard _deviceOptionsChanged() so it tolerates this.mos being undefined at this stage. Sketch:

🛡️ Suggested guard in _deviceOptionsChanged
 			if (this.debugLogging !== settings.debugLogging) {
 				this._logger.info('Changing debugLogging to ' + settings.debugLogging)

 				this.debugLogging = !!settings.debugLogging

-				if (!this.mos) {
-					throw Error('mos is undefined!')
-				}
-				this.mos.setDebug(this.debugLogging)
+				// `this.mos` may not yet exist if observers fire before _initMosConnection completes;
+				// the value will be applied on next change once mos is initialized.
+				this.mos?.setDebug(this.debugLogging)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mos-gateway/src/mosHandler.ts` around lines 132 - 139, The ordering
regression causes setupObservers() to run before _initMosConnection() and can
NPE when _deviceOptionsChanged() calls this.mos.setDebug; fix by guarding
_deviceOptionsChanged() to tolerate a missing mos (e.g. early-return or defer
calling this.mos.setDebug if this.mos is undefined) or by moving
setupObservers() to run only after _initMosConnection()/_updateDevices() has
initialized this.mos; update the code paths in setupObservers,
triggerUpdateDevices, _deviceOptionsChanged, _updateDevices and
_initMosConnection so observers never call this.mos methods when this.mos is
undefined (prefer adding a simple if (!this.mos) return or queue the change
until after _initMosConnection completes).

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/mos-gateway/src/mosHandler.ts`:
- Around line 140-141: init() currently returns before the first
_updateDevices() runs and swallows _initMosConnection() failures, so
MosHandler.create() can appear successful while MOS is broken; update init() to
await the initial device update by returning/awaiting
this.triggerUpdateDevices() (or directly await this._updateDevices()) so init()
only resolves after the first successful update, and change the
_initMosConnection() error handling to rethrow or propagate errors instead of
only logging (so callers of init()/MosHandler.create() receive the failure).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1af185af-b884-4d8c-9a2b-ae15055f4973

📥 Commits

Reviewing files that changed from the base of the PR and between d126baa and de1fece.

📒 Files selected for processing (1)
  • packages/mos-gateway/src/mosHandler.ts

Comment on lines +140 to 141
this.triggerUpdateDevices()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Startup now hides fatal MOS init/config errors

init() returns before the first _updateDevices() runs, and Line 215–217 only logs failures. That means _initMosConnection() errors (eg. invalid mosId/ports) no longer fail MosHandler.create(), so the process can report initialized while MOS is broken.

Suggested fix
 private async init(config: MosConfig, coreHandler: CoreHandler): Promise<void> {
   this.mosOptions = config
   this._coreHandler = coreHandler
   ...
   this.mosTypes = getMosTypes(this.strict)

+  // Fail fast on invalid MOS configuration, but still avoid waiting for device connections
+  await this._initMosConnection()
+
   coreHandler.onConnected(() => {
     this.setupObservers()
     this.sendStatusOfAllMosDevices()
   })
   this.setupObservers()

   this.triggerUpdateDevices()
 }

Also applies to: 214-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mos-gateway/src/mosHandler.ts` around lines 140 - 141, init()
currently returns before the first _updateDevices() runs and swallows
_initMosConnection() failures, so MosHandler.create() can appear successful
while MOS is broken; update init() to await the initial device update by
returning/awaiting this.triggerUpdateDevices() (or directly await
this._updateDevices()) so init() only resolves after the first successful
update, and change the _initMosConnection() error handling to rethrow or
propagate errors instead of only logging (so callers of
init()/MosHandler.create() receive the failure).

@jstarpl jstarpl requested a review from nytamin May 14, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛🔨 bugfix This is a fix for an issue Contribution from NRK Contributions sponsored by NRK (nrk.no)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants