FEAT(client): Add WebRTC based voice activity detection#7127
FEAT(client): Add WebRTC based voice activity detection#7127goodspeed34 wants to merge 9 commits intomumble-voip:masterfrom
Conversation
It adds optional dependency for webrtc-audio-processing extracted from the webrtc library by pulseaudio: https://gitlab.freedesktop.org/pulseaudio/webrtc-audio-processing The library has been already packed by Debian, Ubuntu, Arch, FreeBSD, etc. So it's easy to obtain in Linux/BSD operating systems. However, the commit didn't introduce a bundled version for compilation which might be beneficial for Windows systems. Unfortunately, the library was in meson so many works need to be done for bundling it.
Integrate WebRTC VAD as an alternative to the existing amplitude/SNR-based detection. - Add support for WebRTC GMM VAD with selectable aggressiveness levels - Extend audio input pipeline to use WebRTC VAD - Update audio configuration UI and wizard to expose new option This improves detection reliability, especially for low-volume or distant speech, while reducing false positives from background noise.
WalkthroughThis pull request adds WebRTC-based Voice Activity Detection (VAD) support to Mumble. Changes include a new CMake option ( 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present. |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
src/mumble/AudioInput.ui (1)
523-545: Add accessibility metadata for the new aggressiveness slider.The new slider is missing
accessibleName/accessibleDescription, while neighboring sliders define them. This is an avoidable accessibility gap.🦮 Proposed accessibility additions
<widget class="SemanticSlider" name="qsWebRTCAggressiveness"> + <property name="accessibleName"> + <string>WebRTC VAD aggressiveness</string> + </property> + <property name="accessibleDescription"> + <string>Controls how aggressively WebRTC voice activity detection filters non-speech audio</string> + </property>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mumble/AudioInput.ui` around lines 523 - 545, The new SemanticSlider widget qsWebRTCAggressiveness is missing accessibility metadata; add properties accessibleName and accessibleDescription to match neighboring sliders (use a concise name like "WebRTC aggressiveness" and a description such as "Sets aggressiveness of the WebRTC voice activity detector; higher values filter more noise"), ensuring you add these property entries to the qsWebRTCAggressiveness widget definition so screen readers receive the same information as other sliders.src/mumble/AudioInput.h (1)
326-331: Public memberm_vadWebrtcAggressivenessbreaks encapsulation and is modified without synchronization.
m_vadWebrtcAggressivenessis a public member that is directly modified from UI code (AudioWizard.cpp:669,AudioConfigDialog.cpp:680). This violates encapsulation and creates potential race conditions since the audio thread reads this value inupdateVad().Consider providing a setter method that also triggers
updateVad()to ensure the VAD instance is recreated with the new aggressiveness level.♻️ Suggested refactor
In
AudioInput.h:void updateVad(Settings::VADSource src); `#ifdef` USE_WEBRTC_AUDIO_PROCESSING - webrtc::Vad::Aggressiveness m_vadWebrtcAggressiveness; + void setWebRTCAggressiveness(webrtc::Vad::Aggressiveness aggressiveness); +private: + webrtc::Vad::Aggressiveness m_vadWebrtcAggressiveness; `#endif`In
AudioInput.cpp:`#ifdef` USE_WEBRTC_AUDIO_PROCESSING void AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness aggressiveness) { QMutexLocker l(&qmSpeex); m_vadWebrtcAggressiveness = aggressiveness; if (m_vad == Settings::WebRTC) { m_vadWebrtc = webrtc::CreateVad(m_vadWebrtcAggressiveness); } } `#endif`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mumble/AudioInput.h` around lines 326 - 331, m_vadWebrtcAggressiveness is a public field modified directly from UI code (AudioWizard.cpp/AudioConfigDialog.cpp), breaking encapsulation and risking races because updateVad() reads it on the audio thread; add a synchronized setter (e.g. AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness)) that acquires qmSpeex, updates m_vadWebrtcAggressiveness, and if m_vad == Settings::WebRTC recreates m_vadWebrtc (via webrtc::CreateVad) or calls updateVad() so the change is applied safely from the audio thread.src/mumble/WebRTC_Priv.h (1)
11-16: Include guard uses WebRTC convention, not Mumble convention.The include guard
COMMON_AUDIO_VAD_INCLUDE_VAD_H_follows WebRTC's naming convention rather than Mumble's typicalMUMBLE_MUMBLE_*pattern. While the comment on lines 11-13 explains the rationale (this mirrors an internal WebRTC header), consider adding a note to clarify this is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mumble/WebRTC_Priv.h` around lines 11 - 16, The include guard COMMON_AUDIO_VAD_INCLUDE_VAD_H_ in WebRTC_Priv.h intentionally follows the upstream WebRTC naming convention; update the header comment immediately above the guard to state explicitly that this non-Mumble-style guard is deliberate because the header mirrors the original WebRTC private header and must remain stable for compatibility (mention COMMON_AUDIO_VAD_INCLUDE_VAD_H_ by name and WebRTC_Priv.h so future readers/maintainers won't change it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/mumble/AudioConfigDialog.cpp`:
- Around line 678-682: The UI handler
AudioInputDialog::on_qsWebRTCAggressiveness_valueChanged is directly writing
Global::get().ai->m_vadWebrtcAggressiveness from the UI thread, causing a race
with the audio thread; fix by making the change thread-safe: add/use a
thread-safe setter on AudioInput (e.g. AudioInput::setVadWebrtcAggressiveness or
similar) that acquires the AudioInput internal lock or posts the update to the
audio thread, and ensure updateVad() is executed on the audio thread (either
from inside that setter or via a posted/task-invoked call) rather than modifying
m_vadWebrtcAggressiveness directly from the UI.
In `@src/mumble/AudioInput.cpp`:
- Around line 951-959: The WebRTC branch in the m_vad switch dereferences
m_vadWebrtc without checking for null, which can crash if CreateVad() failed;
update the Settings::WebRTC case to guard against a null m_vadWebrtc (or fall
back to m_preprocessor.getSpeechProb()) before calling
m_vadWebrtc->VoiceActivity(psSource, iFrameSize, iSampleRate) and ensure
fSpeechProb is set safely when m_vadWebrtc is null; reference m_vadWebrtc,
VoiceActivity, Settings::WebRTC, and fSpeechProb to locate and fix the code.
- Around line 1262-1269: updateVad() currently mutates shared members m_vad and
m_vadWebrtc without locking, causing a data race with encodeAudioFrame() which
reads them under qmSpeex; fix by acquiring the same mutex (qmSpeex) at the start
of AudioInput::updateVad before changing m_vad or replacing m_vadWebrtc so
updates are synchronized with encodeAudioFrame(). Also when calling
webrtc::CreateVad(m_vadWebrtcAggressiveness) check the returned pointer for null
before replacing m_vadWebrtc (and handle the null case appropriately, e.g. log
and keep the existing instance) to avoid using a null VAD instance.
- Line 21: The unconditional include of WebRTC_Priv.h in AudioInput.cpp can
break builds when USE_WEBRTC_AUDIO_PROCESSING is not defined; guard the include
with `#ifdef` USE_WEBRTC_AUDIO_PROCESSING (and corresponding `#endif`) so
WebRTC_Priv.h is only included when that macro is enabled, or alternatively
provide a fallback/no-op declarations when the macro is not set; modify the top
of AudioInput.cpp to conditionally include WebRTC_Priv.h using the
USE_WEBRTC_AUDIO_PROCESSING macro to prevent missing-header/compilation errors.
In `@src/mumble/AudioInput.ui`:
- Around line 414-429: The new WebRTC widgets (QRadioButton qrbWebRTC and the
slider/combobox qsWebRTCAggressiveness) are focusable but not included in the
explicit <tabstops> sequence, so add their object names to the UI's tab order
list (the <tabstops> block) alongside the other controls to ensure predictable
keyboard navigation; locate the <tabstops> element in AudioInput.ui and insert
"qrbWebRTC" and "qsWebRTCAggressiveness" (or the exact object names used in the
file) in the correct logical order relative to neighboring widgets so they
receive focus when tabbing.
- Around line 420-421: The UI string for the WebRTC VAD description contains bad
grammar ("was used to detecting speech"); update the <string> content inside the
AudioInput UI where the WebRTC VAD description is defined so it reads correctly
(e.g., "This sets speech detection to use the WebRTC VAD algorithm. In this
mode, a config-free Gaussian Mixture Model is used to detect speech."),
replacing the incorrect phrase with "is used to detect speech" or the provided
revised sentence in the string literal.
In `@src/mumble/AudioWizard.cpp`:
- Line 18: The unconditional inclusion of WebRTC_Priv.h in AudioWizard.cpp
should be guarded by the same compile flag used elsewhere
(USE_WEBRTC_AUDIO_PROCESSING); wrap the include of "WebRTC_Priv.h" inside an
`#ifdef` USE_WEBRTC_AUDIO_PROCESSING / `#endif` block (matching the pattern used in
AudioInput.cpp) so the header is only included when webRTC audio processing is
enabled, keeping other references in AudioWizard.cpp unchanged.
- Around line 667-671: The handler
AudioWizard::on_qsWebRTCAggressiveness_valueChanged currently writes
Global::get().ai->m_vadWebrtcAggressiveness directly from the UI thread causing
a data race and also may not apply the new value to the VAD instance; replace
the direct write with a thread-safe setter on the AudioInput (e.g. add/use
AudioInput::setVadWebrtcAggressiveness(int) that acquires the same mutex/lock
used by the audio thread), or at minimum assign the new aggressiveness into the
AudioInput via its public API before calling updateVad() so updateVad()
recreates m_vadWebrtc with the new value and no unsynchronized access occurs.
---
Nitpick comments:
In `@src/mumble/AudioInput.h`:
- Around line 326-331: m_vadWebrtcAggressiveness is a public field modified
directly from UI code (AudioWizard.cpp/AudioConfigDialog.cpp), breaking
encapsulation and risking races because updateVad() reads it on the audio
thread; add a synchronized setter (e.g.
AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness)) that acquires
qmSpeex, updates m_vadWebrtcAggressiveness, and if m_vad == Settings::WebRTC
recreates m_vadWebrtc (via webrtc::CreateVad) or calls updateVad() so the change
is applied safely from the audio thread.
In `@src/mumble/AudioInput.ui`:
- Around line 523-545: The new SemanticSlider widget qsWebRTCAggressiveness is
missing accessibility metadata; add properties accessibleName and
accessibleDescription to match neighboring sliders (use a concise name like
"WebRTC aggressiveness" and a description such as "Sets aggressiveness of the
WebRTC voice activity detector; higher values filter more noise"), ensuring you
add these property entries to the qsWebRTCAggressiveness widget definition so
screen readers receive the same information as other sliders.
In `@src/mumble/WebRTC_Priv.h`:
- Around line 11-16: The include guard COMMON_AUDIO_VAD_INCLUDE_VAD_H_ in
WebRTC_Priv.h intentionally follows the upstream WebRTC naming convention;
update the header comment immediately above the guard to state explicitly that
this non-Mumble-style guard is deliberate because the header mirrors the
original WebRTC private header and must remain stable for compatibility (mention
COMMON_AUDIO_VAD_INCLUDE_VAD_H_ by name and WebRTC_Priv.h so future
readers/maintainers won't change it).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bb271197-711c-4baf-8e84-bf5afa888596
📒 Files selected for processing (15)
docs/dev/build-instructions/cmake_options.mdsrc/mumble/AudioConfigDialog.cppsrc/mumble/AudioConfigDialog.hsrc/mumble/AudioInput.cppsrc/mumble/AudioInput.hsrc/mumble/AudioInput.uisrc/mumble/AudioWizard.cppsrc/mumble/AudioWizard.hsrc/mumble/AudioWizard.uisrc/mumble/CMakeLists.txtsrc/mumble/EnumStringConversions.cppsrc/mumble/Settings.hsrc/mumble/SettingsKeys.hsrc/mumble/SettingsMacros.hsrc/mumble/WebRTC_Priv.h
The previous implimentation used a public variable for WebRTC aggressiveness aggressiveness control which is too simple and bad for readability. Replace it with a public updateWebrtcAggressiveness function.
It uses std::atomic<Settings::VADSource> to prevent potential data race when switching VAD.
This PR introduces a Voice Activity Detection (VAD) implementation based on the WebRTC Audio Processing library into the client.
WebRTC has been used and tested by various applications and behaves generally great in VoIP. I'm trying to get advanced webrtc audio processing features into mumble and see how far and how great could it get.
The current VAD (amplitude & SNR) struggles with low-volume or distant speech and requires manual tuning depending on the environment, which can be frustrating for users. In contrast, the VAD from WebRTC is designed to be more robust across a wide range of real-world conditions. It leverages more advanced signal analysis to better distinguish speech from noise, resulting in improved detection accuracy without the need for constant user adjustment.
It adds optional dependency for webrtc-audio-processing extracted from the webrtc library by pulseaudio:
https://gitlab.freedesktop.org/pulseaudio/webrtc-audio-processing
The library has been already packed by Debian, Ubuntu, Arch, FreeBSD, etc. So it's easy to obtain in Linux/BSD operating systems. However, the commit didn't introduce a bundled version for compilation which might be beneficial for Windows systems. Unfortunately, the library was in meson so many works need to be done for bundling it.
Checks