fix(voice): wait_for_playout resolves on interrupt instead of deadlocking#5700
Open
MacFall7 wants to merge 1 commit into
Open
fix(voice): wait_for_playout resolves on interrupt instead of deadlocking#5700MacFall7 wants to merge 1 commit into
MacFall7 wants to merge 1 commit into
Conversation
`SpeechHandle.wait_for_playout` and `RunContext.wait_for_playout` awaited only on the playout-completion future, so callers blocked until the 5s `INTERRUPTION_TIMEOUT` fired when an interrupt arrived mid-playout (livekit#5359). Race the playout future against `_interrupt_fut` using the same `asyncio.wait(FIRST_COMPLETED)` primitive already used by `SpeechHandle.wait_if_not_interrupted`. The wait now returns promptly on interrupt; callers inspect `speech_handle.interrupted` to decide how to proceed. Existing internal `await speech_handle.interrupt()` flows that wait for cleanup keep working unchanged. Closes livekit#5359 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
longcw
reviewed
May 11, 2026
| ) | ||
|
|
||
| await asyncio.shield(self._done_fut) | ||
| await self._race_against_interrupt(self._done_fut) |
Contributor
There was a problem hiding this comment.
we cannot resolve the done_fut on interrupt because the audio output may not stopped when interrupt is set, especially for avatar use case. there is already a timeout in _cancel that will resolve the done after a timeout and cancel the tasks arbitrarily.
Author
|
The avatar case is exactly the invariant I missed. `_done_fut` represents
actual playout completion including the audio buffer drain and the avatar
video stream, so resolving it early on interrupt would let callers think
audio has stopped when it hasn't. The 5s timeout in `_cancel` is the right
shape for the "audio is actually done" signal, not a bug.
The original report in #5359 reads as a contract misunderstanding more than
a framework bug: the tool author was awaiting `wait_for_playout` from a
function tool that needed to react to interruption, when the right
primitive for that flow is already `wait_if_not_interrupted` (or just
letting the interrupt propagate through the tool exception path).
Two paths I see, happy with either:
1. **Close this PR.** I'll comment on #5359 explaining the intent of
`wait_for_playout`, point toward `wait_if_not_interrupted` as the existing
interrupt-aware primitive, and suggest the tool author rework their flow.
2. **Reshape as a new explicit method on `RunContext`**, something like
`wait_for_playout_or_interrupt()` that races the two and documents both
signals, leaving `wait_for_playout` semantics intact. Useful if you think
tool authors will hit this regularly enough to want a built-in.
Tell me which is helpful and I'll close or rework accordingly.
…On Sun, May 10, 2026 at 6:19 PM Long Chen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In livekit-agents/livekit/agents/voice/speech_handle.py
<#5700 (comment)>:
> @@ -179,7 +185,7 @@ async def wait_for_playout(self) -> None:
"To wait for the assistant’s spoken response prior to running this tool, use `RunContext.wait_for_playout()` instead."
)
- await asyncio.shield(self._done_fut)
+ await self._race_against_interrupt(self._done_fut)
we cannot resolve the done_fut on interrupt because the audio output may
not stopped when interrupt is set, especially for avatar use case. there is
already a timeout in _cancel that will resolve the done after a timeout
and cancel the tasks arbitrarily.
—
Reply to this email directly, view it on GitHub
<#5700 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A6EDBTNOKQZW45PZSMF7DQ342ETDBAVCNFSM6AAAAACYYIN4OWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DENRQGI2DMMRVGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Description: fix(voice): wait_for_playout resolves on interrupt instead of deadlocking
Closes #5359.
Summary
SpeechHandle.wait_for_playout()andRunContext.wait_for_playout()awaited_done_fut(or_wait_for_generation) only. Wheninterrupt()fired,_interrupt_futresolved but the wait did not, so callers blocked until the 5sINTERRUPTION_TIMEOUThard-killed the surrounding tasks. In a tool preamble flow that meant scheduling paused, the worker drained, and subsequent user input was dropped.This PR races the existing wait against
_interrupt_futin both paths, using the same primitiveSpeechHandle.wait_if_not_interrupted()already uses. The wait now returns as soon as either future resolves. Callers that need to distinguish completion from interruption readspeech_handle.interruptedon return.Deviation from #5359's proposed solution (read this first)
The issue author proposed raising
InterruptedErrorwhen the interrupt wins the race. I started there and the test suite caught a real problem: the SDK already has internal callers (notablyagent_activity.py:2079) thatawait speech_handle.interrupt()and rely onwait_for_playoutreturning so cleanup can proceed. Raising broke seven unrelated tests, all of them about correct interrupt/handoff/drain handling.I switched to a return-early semantic for three reasons:
SpeechHandle.wait_if_not_interrupted()(which this fix lifts the race pattern from) returns. It does not raise. The twowait_for_playoutpaths now match it.wait_for_playoutcontinues to compile and behave the same way on the non-interrupt path. The interrupt path was previously a 5s deadlock; it is now an immediate return. No public signature change.If the maintainers prefer raising as the default, I am happy to follow that direction as a separate PR with the necessary callsite adjustments inside the SDK. I think return-early is the right call here, but the choice is reasonable either way and I do not want to assume.
Changes
livekit-agents/livekit/agents/voice/speech_handle.py—wait_for_playout()now races_done_futagainst_interrupt_futwithasyncio.wait(FIRST_COMPLETED). Cancels the losing future. Returns either way;self.interruptedis true if the interrupt won.livekit-agents/livekit/agents/voice/events.py—RunContext.wait_for_playout()applies the same race against_interrupt_fut.tests/test_speech_handle_interruption.py— three deterministic unit tests, ~0.35s total runtime. Two cover the regression:SpeechHandle.wait_for_playoutandRunContext.wait_for_playoutboth resolve under their 1.0s timeout when_cancel()fires. The third covers the no-interrupt case to confirm the happy path is unchanged.Diff is intentionally minimal. No public signatures changed.
INTERRUPTION_TIMEOUTis unchanged: it remains the right backstop for cases where the race itself misbehaves.Test plan
mainwithTimeoutError(confirmed via stash/pop cycle). The two regression tests are the bug-shape proof.ruff checkandruff formatclean.pytestsuite green on this branch except for two pre-existing failures (test_events_and_metrics,test_aclose_handles_precancelled_tasks_gracefully) that also fail on the stashed baseline and are unrelated to this change. Happy to triage either as a separate issue if it would be useful.Notes for reviewers
wait_if_not_interrupted) is in the same file at line 201. I cited it in the implementation rather than refactoring into a shared helper, to keep the diff scoped. Happy to refactor if you prefer a shared private helper.__await__path onSpeechHandle(soawait speech_handleworks) routes throughwait_for_playout, so this fix incidentally makesawait speech_handleinterruptable as well. That matches expectation, and it is whatagent_activity.py:2079was already implicitly relying on.