feat: streaming reliability for shared-HomeBase battery cameras#24
feat: streaming reliability for shared-HomeBase battery cameras#24josha wants to merge 28 commits into
Conversation
JMuxer defaults videoCodec to "H264"; the muxed client handler never overrode it, so HEVC cameras (T86P2/4G LTE Cam S330, S340 in H.265 mode) produced fMP4 with an avcC sample description over HEVC NAL units. The Rebroadcast plugin's downstream ffmpeg then ran into "timeout waiting for data" and HomeKit/WebRTC live view never started. Snapshot path was unaffected (writes raw bitstream to ffmpeg with -f hevc) but appeared to fail because Rebroadcast wedged the upstream session. Codec isn't known when the muxer client first connects, so handleMuxedClient is now async: register the socket as pending (counts as a consumer so the upstream livestream still starts), await waitForVideoMetadata, then build the JMuxer with videoCodec="H264" or "H265". Also tightens the H.265 snapshot resolver to require an actual IRAP slice (types 16-23) rather than accepting parameter-set-only events. T86P2 happens to bundle VPS+SPS+PPS+IDR so this didn't bite, but cameras that deliver parameter sets out-of-band would resolve snapshots with bytes ffmpeg couldn't decode. Plus: log the actual captured-stream codec in "Captured ... keyframe" instead of always saying "H.264". H.264 path is unchanged. Tested live on T86P2 (H.265-only) and S340 (both H.264 and H.265 modes).
When the upstream HomeBase / eufy-security-ws is wedged (accepts CMD_START_REALTIME_MEDIA but returns no P2P data, observed as 'Stopping the station stream ... no data for 5 seconds' / 'Result data for command not received' in the WS server logs), the previous ensureLivestreamState recursion would issue another startLivestream every 30 seconds forever as long as any consumer kept livestreamIntendedState=true. Each extra command piles backpressure onto an already-stuck HomeBase and prolongs the wedge. Add a consecutive-no-data counter. After MAX_NO_DATA_STARTS (=3) back-to-back startLivestream attempts that never produced a video event, give up: clear intent, emit streamError, and stop the recursive 30s timer. Counter resets to zero the instant video data actually arrives (livestreamActualState=true), so this only kicks in for genuinely-stuck sessions, not slow-warmup ones. Recovery is consumer-initiated: a fresh muxer or snapshot request sets intent back to true and the cycle restarts. Combined with a physical HomeBase reset (the actual fix for a wedged upstream), this keeps the plugin from making the problem worse while the user recovers the hardware.
Battery / cellular Eufy cameras (T8170 S340 deep-sleep wake, T86P2 4G LTE Cam S330 cold P2P) routinely take 30–45s to deliver their first IDR after startLivestream. Observed empirically: Patio (T86P2) consistently delivered video on the third 30s retry of ensureLivestreamState (≈45s after the request), with the camera responding correctly to the underlying eufy-security-ws commands the whole time. The previous 15s timeouts caused two failure modes during that warm-up window: 1. `handleMuxedClient` timed out and built JMuxer with the wrong default codec (H.264) just before real H.265 frames arrived. The fMP4 it produced was un-decodable; Rebroadcast/WebRTC saw no usable output and gave up. 2. `captureSnapshot` timed out and reported "no keyframe received" even though one was about to arrive seconds later. Bumping both defaults to 60s lets the camera's natural startLivestream retry cycle complete before either consumer abandons the session. Existing user-supplied timeouts (via RequestPictureOptions.timeout) are respected unchanged.
|
Adding a third commit based on live-camera evidence from this evening's testing:
Battery and 4G LTE Eufy cameras (T8170 SoloCam S340 deep-sleep wake, T86P2 4G LTE Cam S330 cold P2P) routinely take 30–45s to deliver their first IDR after the upstream-issued With the previous 15s timeouts:
Bumping both Verified end-to-end live in HomeKit on Front Door (T86P2 H.265): live view, snapshot, audio all working after this change. |
When bropat's P2P session to a station goes zombie (startLivestream acked but no LIVESTREAM_VIDEO_DATA events flow), the plugin now detects the wedge and recycles the per-station P2P session via station.disconnect/connect — recovering without a HomeBase reboot or plugin reload. Two detection paths, both routing through a shared markUpstreamWedged(): - Cold-start: existing 3-consecutive-no-data-starts counter. - Mid-session: new data-flow watchdog tracking lastVideoDataAt. Fires when intent=true, prior data existed, >15s since last frame, and consumers are still attached. Catches the common "stream was working, then stopped" pattern the counter misses. Recycle waits for the actual STATION_EVENTS.CONNECTED event (or CONNECTION_ERROR / 30s timeout) before declaring complete, since station.connect() returns when bropat accepts the command, not when P2P is re-established (10–25s for cellular cameras / cold HomeBase). Stream server defers startLivestream while a recycle is in flight, and auto-arms once it clears if consumers are still waiting — so a HomeKit session arriving mid-recovery gets video without the user retrying. Battery-safe gating: watchdog requires totalConsumers > 0; no auto-restart of the livestream after recycle in absence of consumers; recycle is rate-limited to once per 5 minutes per device. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three streaming-reliability improvements for the Eufy bridge, verified on T86P2 (Front Door) and S340 (Porch) hardware: 1. Last-keyframe cache (eufy-stream-server): retain every decodable keyframe that flows through and serve snapshots/thumbnails from it for up to 1h without waking the camera. Fixes the Home-app grid, which previously fired one cold P2P wake per camera and mostly timed out. Refreshed for free whenever the camera is already awake (live view, HKSV, motion). 2. Streamlined cold-start (eufy-stream-server): MAX_NO_DATA_STARTS 3->1 and the start timeout 30s->45s. A deeply-idle camera now gets one startLivestream, a full 45s wake window, then a station P2P recycle -- instead of hammering startLivestream 3x over ~90s, which compounded the stall. 3. Station-recycle guard (station-stream-registry + eufy-device): track which devices are actively delivering video per HomeBase and defer the (shared) P2P recycle when a sibling on the same station is streaming, so one camera's recovery doesn't tear down another's live session. Also includes the pre-existing codec-hint persistence in eufy-device (stores last-detected codec so the first stream after a reload advertises the right codec) per project decision to ship it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Observed on Porch (weak-signal S340): an audio-only P2P session (audio metadata captured, no video) correctly wedged after 45s, but the recycle — its only recovery path — was deferred because the registry showed Front Door streaming, so Porch silently failed. Two fixes: - Close a registry leak: the "already stopped as desired" branch of ensureLivestreamState never flipped actual→false, so a stream bropat stopped on its own could leave a stale "active" entry. - Verify before deferring: the recycle guard now confirms the sibling is actually livestreaming (isLivestreaming) before sacrificing this device's recovery; if the registry entry is stale it's cleared and the recycle proceeds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ck/timeout) Root cause of "live view never starts / black, ffmpeg timeout waiting for data" on mic-off cameras (e.g. Upper Garage, microphone:false): JMuxer was hardcoded to mode "both". JMuxer won't emit a single fMP4 byte until EVERY declared track has a sample (remux isReady()), so a video-only camera muxed as "both" waits forever for an audio track that never arrives -> downstream ffmpeg times out -> live view fails. Cameras WITH a mic (Front Door) worked only incidentally because audio was present. Fix: detect audio empirically (deliversAudio flag, set when any audio frame arrives) and pick the muxer mode accordingly. On the first stream, when it's not yet known, briefly wait (<=2.5s, video already flowing) for an audio frame; if none arrives the camera is video-only -> mode "video". The socket stays in pendingMuxerSockets during that wait so it still counts as a consumer. Verified end-to-end by feeding a real captured H.265 stream from a mic-off camera through a real StreamServer+JMuxer: mode "video" now emits valid, ffmpeg-decodable fMP4 (1.6MB out); the old "both" path produced zero bytes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Home-app "Cameras" grid fires a snapshot request per camera at once. A HomeBase serves only one P2P stream at a time, so when a cache-miss woke the camera, all the tiles stampeded the single HomeBase slot and failed together (the wedge cascade). takePicture now serves the last cached keyframe at ANY age and never falls through to a livestream wake. If nothing has been cached yet (camera hasn't streamed this session) it throws instead of waking — the tile keeps its last image. The cache still refreshes whenever the camera is genuinely awake (live view, motion/HKSV recording, and — coming next — a serial background refresh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Throwing from takePicture made Scrypted's Snapshot plugin fall back to pulling a frame from the video stream — which started a livestream and re-created the exact HomeBase stampede we were trying to prevent (observed: "Failed to capture snapshot ... not waking" immediately followed by "Getting video stream ... Starting livestream" for every grid tile). Now a cache-miss returns a small valid placeholder JPEG. A successful takePicture means the Snapshot plugin does NOT fall back to the video stream, so the grid never wakes a camera. The real frame replaces the placeholder once the camera next streams (live tap, motion, or the upcoming serial background refresh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-wins) A Eufy HomeBase serves only one camera P2P stream at a time; concurrent starts mutually starve. This adds a process-wide, per-station slot that serializes streaming across cameras on the same HomeBase: - StreamServer gates startLivestream on an injected acquireStreamSlot(): "live" (a viewer/recorder is attached) always wins, preempting the current holder (whose onRevoke stops it); "background" (thumbnail refresh) is denied while the slot is busy. When not injected (CLI/tests) behavior is unchanged. - Clean handoff: a preemptor waits for the previous holder to release (lease.whenReady) before starting, so the two never overlap on the slot — this fixes mutual P2P starvation AND audio bleeding into the wrong muxer. - A preempted camera yields without recycling (slotRevoked): it stays down until its own consumers drain, and the mid-session wedge watchdog is suppressed — killing the false-wedge recycle cascade. - eufy-device injects the coordinator (keyed by station serial; 4G self- station cameras never contend). Verified end-to-end on real hardware: camera A streams, camera B preempts it, A yields cleanly (no thrash), B streams correctly (mode=video, no audio bleed) — newest-wins with no mutual starvation. Coordinator + gating + revoke + handoff all unit-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Thumbnails are served from cache and never wake on the grid; this keeps that
cache from going stale without re-introducing a stampede. Each device runs a
staggered timer that, every 5 min, wakes the camera ONLY if:
- its cached frame is empty or older than 45 min,
- the HomeBase slot is free (background priority — the coordinator denies it
while a viewer/recorder holds the slot), and
- it's not in failure backoff.
Wakes go through the per-station coordinator at "background" priority, so they
serialize across the HomeBase (one at a time) and always yield to live view /
recording. A camera that fails to deliver (dead/asleep, e.g. Porch) backs off
exponentially (10 min → cap 6 h) so it isn't hammered every cycle. Refresh
policy is a pure, unit-tested helper.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A battery camera kept streaming for ~30s after you closed the Home app (the post-detach inactivity grace). lastClientActivity advances on every frame while a muxer is attached, so this never trips during active viewing — it only governs the post-close drain. Cut 30s → 12s: still long enough to reuse a warm stream across Rebroadcast's quick reconnect churn (avoiding a cold "find sync frame" on a fast reopen), but ~18s less battery burn per view. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Background thumbnail refresh now defaults to 2h (was 45m) — gentler on battery/LTE cameras. Adds a per-camera "Background Thumbnail Refresh" setting (Off / 30 min / 1 h / 2 h / 4 h) in the Scrypted device UI so solar cams can run aggressive and battery/LTE cams lazy or off. "Off" disables the refresh wake entirely for that camera. Choice→duration mapping is a pure, tested helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Base A wedged camera recovers by recycling its HomeBase P2P session — which briefly disrupts EVERY camera on that HomeBase. Worth it when the recycle recovers the camera, but a camera that can't stream at all (no WiFi signal — e.g. Patio at -100 dBm / level 0, Porch) just keeps wedging and recycling, punishing its healthy siblings for nothing. Guards added to recycleStationP2P: - No-signal: a HomeBase camera reporting wifiSignalLevel 0 can't stream — don't recycle (it won't help), suppress for 30 min. - Chronic failure: after one recycle that doesn't recover the camera (no video since), suppress further recycles for 30 min. - Reset on livestreamActive (the camera recovered → re-arm). Net: a dead/weak camera causes at most one HomeBase recycle, then fails fast instead of dragging down the healthy cameras. Decision logic is a pure, unit-tested helper. The real fix for those cameras is still better signal (repeater/relocation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…request override it)
createOptimizedMediaObject spread `...options?.video` AFTER setting `codec`,
so a consumer requesting `codec: 'h264'` (e.g. HomeKit) overwrote our detected
H.265 with 'h264'. The stream was then advertised as h264 while carrying h265
bytes, so downstream `-vcodec copy` produced a broken stream ("codec must be
h264 but is h265"). Now our source codec wins; the request can't relabel it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ning error Two issues surfaced after a deploy: the whole fleet woke at once, and a tap had to fight through it. - Background thumbnail refresh no longer wakes a camera whose cache is EMPTY. Every camera's in-memory cache is empty right after a plugin reload, so waking on empty stampeded all of them — including disabled/unused (301) and dead (Patio) cameras — and contended with the user's live view. Tiles are now populated on demand (live view / motion); the background refresh only keeps an already-seen tile from going stale. - Stopping a livestream that isn't actually running (a preempted camera that was told to start but never delivered) now treats device_livestream_not_running as success instead of erroring + retrying 3×. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…storm) Tiles went to placeholder after every plugin reload (in-memory cache cleared), which meant tapping each camera to repopulate. Now the last cached keyframe is saved to Scrypted device storage when a stream stops and restored into the cache on stream-server creation — so a camera's tile shows its last-seen image immediately after a reload, with no camera wake and no refresh stampede. Keyframes are small (~10–110 KB compressed) and capped at 220 KB. Restore only seeds an empty cache (a live frame always wins) and stamps "now" so it isn't treated as stale right after a restart. Cameras self-populate from normal activity (live view or motion recording), so no manual tapping is needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tap, no storm)" This reverts commit 9faf460.
The Home-app home screen requests a live preview for every visible camera at once. With newest-wins preemption, those competing live requests on a single HomeBase kept kicking each other (and a deliberate tap) off the one slot — thrashing hard enough to restart the plugin. Now a "live" request only takes over a STUCK live holder (past an 8s warm-up grace AND not yet delivering video — e.g. a dead camera). A holder that is delivering, or still warming up, is protected: the competing request is denied and fails fast (the consumer retries; the slot frees when the holder stops). "live" still always beats a background (thumbnail-refresh) holder. The lease gains markDelivering(), called when video starts flowing (setLivestreamActual(true)). Coordinator takes an injectable clock for tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tap, no storm)" This reverts commit b4ef31e.
The coordinator already tracks which camera holds the slot and whether it's delivering video (markDelivering, from the preemption fix), updated from the same trigger as the old registry. So the separate station-stream-registry was redundant. Replaced its one consumer — the recycle guard's "is a sibling streaming?" check — with coordinator.otherDeviceDeliveringOnStation, and deleted the module, its events/wiring, and its tests. Also drops the now- unnecessary isLivestreaming verify (the coordinator is authoritative: a holder's delivering flag clears when its lease releases on stop/wedge). Net: one fewer module and ~one fewer WS round-trip on the recycle path; no behavior change to streaming. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
61b51a7 to
0f8459e
Compare
…ence window A cold tap on an idle camera often gets no video on the first startLivestream (the HomeBase P2P session is wedged). The reliable recovery is a station P2P recycle, after which the camera delivers its first frame in ~1-2s. But the recycle was gated behind a 45s cold-start wait, so it landed AFTER HomeKit's hard ~30s give-up: the viewer (and the prebuffer ffmpeg output it feeds) were already torn down, so the recovered video reached nobody (Connection refused -> ffmpeg exited -> stop). Lower COLD_START_STALE_THRESHOLD_MS 45s -> 18s so detect (~18s) + recycle/ restart/first-frame (~10s) ~= 28s fits inside the ~30s viewer window. A genuinely deep-sleep camera that needs >30s to wake misses this live view regardless; recycling early still warms the keyframe cache + P2P for the next tap. Signal-dead cameras can't loop on this (recycle suppression). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…view Three reliability fixes surfaced while testing camera-switching on a shared HomeBase (one P2P stream at a time): 1. Staggered handoff. handleSlotRevoked() released the slot lease synchronously before stopping our P2P. The preemptor's whenReady is gated on that release, so it started startLivestream while our stream was still up — overlapping start/stop starves the one-stream HomeBase, so the new camera got a frame (metadata) then stalled and its muxer never emitted. Now we stop first and release only after the stop completes (finally as a backstop for the already-stopped case), so the holder frees the HomeBase before the preemptor starts. 2. ADTS-gated audio detection. deliversAudio was set true on ANY audio event, including the non-ADTS AudioSpecificConfig config packet that mic-off cameras still emit. That forced JMuxer mode=both, which never emits until it has a real audio sample — so a video-only camera hung black. Only a real ADTS frame now marks the camera audio-capable; otherwise it's muxed video-only. 3. both -> video fallback. Some cameras send a stray ADTS frame (so mode=both) but no continuous audio track, hanging the muxer anyway. If a both muxer emits no fMP4 within BOTH_TO_VIDEO_FALLBACK_MS (4s) it is rebuilt video-only on the same socket, guaranteeing video. Real-audio cameras emit in <1s and keep their audio. Tests: staggered-handoff ordering, ADTS-only audio detection, and the both->video rebuild (84 stream-server tests pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e holder The preemption-thrash guard protected a live holder from preemption while it was DELIVERING or warming up. That killed the Home-app grid stampede, but it also permanently blocked deliberate switching: tapping camera B while camera A held the HomeBase slot was denied forever (slot busy — never streamed). Relax it: protect a holder only during its brief warm-up window (PREEMPT_MIN_HOLD_MS). That window alone absorbs the grid's near-simultaneous burst (all land within a second or two). A live request that arrives AFTER the window is necessarily a deliberate switch to another camera, so it preempts and takes over — newest tap wins — even from a delivering holder, via the existing clean handoff. Switches are therefore spaced at least one warm-up apart, which keeps takeover from churning the HomeBase (the original thrash was unbounded sub-second churn). isDelivering is now used only by the recycle guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…WebRTC Scrypted's FFmpegInput can't request H.264 output — consumers (HomeKit, the Scrypted-UI WebRTC preview) do `-c:v copy` unless the user manually enables a per-camera "Transcoding Debug Mode". To make H.264 work out of the box (and fix the black H.265-over-WebRTC live view), the plugin now emits real H.264 itself. - h264-transcode-server.ts: a per-device local TCP relay. Each client connection spawns one ffmpeg that reads the stream server's muxed fMP4 (H.265) port and pipes fragmented-MP4 H.264 (libx264 ultrafast/zerolatency, audio copied through) to that socket. One ffmpeg per client; stop() destroys live sockets so the listener closes promptly. The relay's ffmpeg becoming a muxed-port reader is what wakes/idle-stops the livestream, so the cold-start and coordinator lifecycle is preserved. - stream-service.ts: when the toggle is on AND the source is H.265, return a MediaObject that reads the relay (advertised as h264); native H.264 passes through untouched, so a camera that sometimes sends H.264 is never needlessly re-encoded. getVideoStreamOptions advertises h264 to match. - eufy-device.ts: per-camera "Transcode to H.264" setting (Streaming group), default ON for cameras whose last-detected codec is H.265. Trade-off: one shared software encode per active stream (no per-consumer bitrate adaptation), CPU cost on the host; the coordinator caps concurrency to ~one live stream per HomeBase. Toggle off per-camera if the host runs hot. Tests: relay lifecycle/spawn/teardown + stream-service transcode branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…is hot The in-plugin H.264 transcode is a software encode per active stream, which can heat a small host (Raspberry Pi) under concurrent streams. Add a process-wide thermal governor that samples the host CPU temperature (/sys/class/thermal/thermal_zone0/temp) every 10s and: - warns (log + Scrypted alert via log.a) at >=70C, - throttles at >=78C: new streams that would transcode fall back to H.265 passthrough (no encode) until the host cools, - uses hysteresis (clear critical <72C, clear warn <66C) so it doesn't flap, - is inert when the temperature source is unreadable (non-Pi / sandbox) — it can only ever make the transcode path more conservative, never break it. Existing encodes are left to finish (short live-view sessions) rather than killed mid-frame. StreamService gains an isThrottling getter; the provider starts the governor and routes alerts to the Scrypted UI. Tests: thresholds, hysteresis, unreadable-source inertness, alert transitions, and the singleton throttle flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r mocks) These two suites had been failing (30 tests) since the device constructor grew a storage-backed codec hint and StreamServer event listeners: the inline ScryptedDeviceBase mock had no `storage`/`log`, and the StreamServer mock had no `on()` (nor the muxed-port/keyframe methods createStreamServer now uses). Add an in-memory storage + log stub to the mocked base class and flesh out the StreamServer mock. Full scrypted suite now green (246 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an [Unreleased] CHANGELOG section and a README fork notice covering the single-stream coordinator, snapshot cache, cold-start recovery, audio-aware muxing, camera switching, in-plugin H.264 transcoding, and the thermal governor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Superseded by focused PRs: #28 (reliability for shared-HomeBase battery cameras — the general bug fixes, closes #23) and #29 (opt-in in-plugin H.264 transcoding + thermal governor, stacked on #28). This branch had grown into a 28-commit mix of fixes and features; splitting it makes the bug fixes mergeable on their own without committing to the opinionated transcoding feature. |
This branch grew well beyond the original
videoCodecchange into a full streaming-reliability layer for shared-HomeBase battery setups (multiple cameras on one HomeBase, on-demand/no-prebuffer).The minimal HEVC fix that closes #23 is broken out into #27 for easy review/merge. This PR is the broader work on top of it:
startLivestreamonce, wait the real wake window, then recycle the station P2P session instead of hammering; suppress false-wedge recycles.All packages build; unit tests cover the coordinator, audio-aware muxing, cache, refresh policy, and slot gating. Verified end-to-end on real hardware (T86P2 / T8170 on an S380 HomeBase). Happy to split further or pare down scope to whatever you'd like to take.