Skip to content

FEAT(client): Add WebRTC based voice activity detection#7127

Open
goodspeed34 wants to merge 9 commits intomumble-voip:masterfrom
goodspeed34:webrtc-audio-processing
Open

FEAT(client): Add WebRTC based voice activity detection#7127
goodspeed34 wants to merge 9 commits intomumble-voip:masterfrom
goodspeed34:webrtc-audio-processing

Conversation

@goodspeed34
Copy link
Copy Markdown

This PR introduces a Voice Activity Detection (VAD) implementation based on the WebRTC Audio Processing library into the client.

WebRTC has been used and tested by various applications and behaves generally great in VoIP. I'm trying to get advanced webrtc audio processing features into mumble and see how far and how great could it get.

The current VAD (amplitude & SNR) struggles with low-volume or distant speech and requires manual tuning depending on the environment, which can be frustrating for users. In contrast, the VAD from WebRTC is designed to be more robust across a wide range of real-world conditions. It leverages more advanced signal analysis to better distinguish speech from noise, resulting in improved detection accuracy without the need for constant user adjustment.

It adds optional dependency for webrtc-audio-processing extracted from the webrtc library by pulseaudio:

https://gitlab.freedesktop.org/pulseaudio/webrtc-audio-processing

The library has been already packed by Debian, Ubuntu, Arch, FreeBSD, etc. So it's easy to obtain in Linux/BSD operating systems. However, the commit didn't introduce a bundled version for compilation which might be beneficial for Windows systems. Unfortunately, the library was in meson so many works need to be done for bundling it.

Checks

It adds optional dependency for webrtc-audio-processing extracted
from the webrtc library by pulseaudio:

https://gitlab.freedesktop.org/pulseaudio/webrtc-audio-processing

The library has been already packed by Debian, Ubuntu, Arch, FreeBSD,
etc. So it's easy to obtain in Linux/BSD operating systems. However,
the commit didn't introduce a bundled version for compilation which
might be beneficial for Windows systems. Unfortunately, the library
was in meson so many works need to be done for bundling it.
Integrate WebRTC VAD as an alternative to the existing amplitude/SNR-based detection.

- Add support for WebRTC GMM VAD with selectable aggressiveness levels
- Extend audio input pipeline to use WebRTC VAD
- Update audio configuration UI and wizard to expose new option

This improves detection reliability, especially for low-volume or distant speech, while reducing false positives from background noise.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

Walkthrough

This pull request adds WebRTC-based Voice Activity Detection (VAD) support to Mumble. Changes include a new CMake option (webrtc-audio-processing) that configures compilation, header definitions for the WebRTC VAD interface (WebRTC_Priv.h), and integration points in the audio pipeline. The Settings structure gains a new VADSource::WebRTC enum value and an fVADWebRTCAggressiveness configuration field. Audio configuration dialogs and the audio wizard are updated with UI controls for selecting WebRTC VAD and adjusting aggressiveness levels. Runtime VAD switching is implemented via updateVad() methods in AudioInput and related components. All WebRTC-specific code is gated behind the USE_WEBRTC_AUDIO_PROCESSING compile flag.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'FEAT(client): Add WebRTC based voice activity detection' clearly and concisely summarizes the main change: adding WebRTC-based VAD as an alternative to existing voice activity detection methods.
Description check ✅ Passed The PR description provides comprehensive context about the WebRTC VAD feature, rationale for the change, dependency information, and notes limitations. It includes the required commit guidelines checklist marked as completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
src/mumble/AudioInput.ui (1)

523-545: Add accessibility metadata for the new aggressiveness slider.

The new slider is missing accessibleName/accessibleDescription, while neighboring sliders define them. This is an avoidable accessibility gap.

🦮 Proposed accessibility additions
 <widget class="SemanticSlider" name="qsWebRTCAggressiveness">
+ <property name="accessibleName">
+  <string>WebRTC VAD aggressiveness</string>
+ </property>
+ <property name="accessibleDescription">
+  <string>Controls how aggressively WebRTC voice activity detection filters non-speech audio</string>
+ </property>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mumble/AudioInput.ui` around lines 523 - 545, The new SemanticSlider
widget qsWebRTCAggressiveness is missing accessibility metadata; add properties
accessibleName and accessibleDescription to match neighboring sliders (use a
concise name like "WebRTC aggressiveness" and a description such as "Sets
aggressiveness of the WebRTC voice activity detector; higher values filter more
noise"), ensuring you add these property entries to the qsWebRTCAggressiveness
widget definition so screen readers receive the same information as other
sliders.
src/mumble/AudioInput.h (1)

326-331: Public member m_vadWebrtcAggressiveness breaks encapsulation and is modified without synchronization.

m_vadWebrtcAggressiveness is a public member that is directly modified from UI code (AudioWizard.cpp:669, AudioConfigDialog.cpp:680). This violates encapsulation and creates potential race conditions since the audio thread reads this value in updateVad().

Consider providing a setter method that also triggers updateVad() to ensure the VAD instance is recreated with the new aggressiveness level.

♻️ Suggested refactor

In AudioInput.h:

 	void updateVad(Settings::VADSource src);

 `#ifdef` USE_WEBRTC_AUDIO_PROCESSING
-	webrtc::Vad::Aggressiveness m_vadWebrtcAggressiveness;
+	void setWebRTCAggressiveness(webrtc::Vad::Aggressiveness aggressiveness);
+private:
+	webrtc::Vad::Aggressiveness m_vadWebrtcAggressiveness;
 `#endif`

In AudioInput.cpp:

`#ifdef` USE_WEBRTC_AUDIO_PROCESSING
void AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness aggressiveness) {
    QMutexLocker l(&qmSpeex);
    m_vadWebrtcAggressiveness = aggressiveness;
    if (m_vad == Settings::WebRTC) {
        m_vadWebrtc = webrtc::CreateVad(m_vadWebrtcAggressiveness);
    }
}
`#endif`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mumble/AudioInput.h` around lines 326 - 331, m_vadWebrtcAggressiveness is
a public field modified directly from UI code
(AudioWizard.cpp/AudioConfigDialog.cpp), breaking encapsulation and risking
races because updateVad() reads it on the audio thread; add a synchronized
setter (e.g. AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness))
that acquires qmSpeex, updates m_vadWebrtcAggressiveness, and if m_vad ==
Settings::WebRTC recreates m_vadWebrtc (via webrtc::CreateVad) or calls
updateVad() so the change is applied safely from the audio thread.
src/mumble/WebRTC_Priv.h (1)

11-16: Include guard uses WebRTC convention, not Mumble convention.

The include guard COMMON_AUDIO_VAD_INCLUDE_VAD_H_ follows WebRTC's naming convention rather than Mumble's typical MUMBLE_MUMBLE_* pattern. While the comment on lines 11-13 explains the rationale (this mirrors an internal WebRTC header), consider adding a note to clarify this is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mumble/WebRTC_Priv.h` around lines 11 - 16, The include guard
COMMON_AUDIO_VAD_INCLUDE_VAD_H_ in WebRTC_Priv.h intentionally follows the
upstream WebRTC naming convention; update the header comment immediately above
the guard to state explicitly that this non-Mumble-style guard is deliberate
because the header mirrors the original WebRTC private header and must remain
stable for compatibility (mention COMMON_AUDIO_VAD_INCLUDE_VAD_H_ by name and
WebRTC_Priv.h so future readers/maintainers won't change it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/mumble/AudioConfigDialog.cpp`:
- Around line 678-682: The UI handler
AudioInputDialog::on_qsWebRTCAggressiveness_valueChanged is directly writing
Global::get().ai->m_vadWebrtcAggressiveness from the UI thread, causing a race
with the audio thread; fix by making the change thread-safe: add/use a
thread-safe setter on AudioInput (e.g. AudioInput::setVadWebrtcAggressiveness or
similar) that acquires the AudioInput internal lock or posts the update to the
audio thread, and ensure updateVad() is executed on the audio thread (either
from inside that setter or via a posted/task-invoked call) rather than modifying
m_vadWebrtcAggressiveness directly from the UI.

In `@src/mumble/AudioInput.cpp`:
- Around line 951-959: The WebRTC branch in the m_vad switch dereferences
m_vadWebrtc without checking for null, which can crash if CreateVad() failed;
update the Settings::WebRTC case to guard against a null m_vadWebrtc (or fall
back to m_preprocessor.getSpeechProb()) before calling
m_vadWebrtc->VoiceActivity(psSource, iFrameSize, iSampleRate) and ensure
fSpeechProb is set safely when m_vadWebrtc is null; reference m_vadWebrtc,
VoiceActivity, Settings::WebRTC, and fSpeechProb to locate and fix the code.
- Around line 1262-1269: updateVad() currently mutates shared members m_vad and
m_vadWebrtc without locking, causing a data race with encodeAudioFrame() which
reads them under qmSpeex; fix by acquiring the same mutex (qmSpeex) at the start
of AudioInput::updateVad before changing m_vad or replacing m_vadWebrtc so
updates are synchronized with encodeAudioFrame(). Also when calling
webrtc::CreateVad(m_vadWebrtcAggressiveness) check the returned pointer for null
before replacing m_vadWebrtc (and handle the null case appropriately, e.g. log
and keep the existing instance) to avoid using a null VAD instance.
- Line 21: The unconditional include of WebRTC_Priv.h in AudioInput.cpp can
break builds when USE_WEBRTC_AUDIO_PROCESSING is not defined; guard the include
with `#ifdef` USE_WEBRTC_AUDIO_PROCESSING (and corresponding `#endif`) so
WebRTC_Priv.h is only included when that macro is enabled, or alternatively
provide a fallback/no-op declarations when the macro is not set; modify the top
of AudioInput.cpp to conditionally include WebRTC_Priv.h using the
USE_WEBRTC_AUDIO_PROCESSING macro to prevent missing-header/compilation errors.

In `@src/mumble/AudioInput.ui`:
- Around line 414-429: The new WebRTC widgets (QRadioButton qrbWebRTC and the
slider/combobox qsWebRTCAggressiveness) are focusable but not included in the
explicit <tabstops> sequence, so add their object names to the UI's tab order
list (the <tabstops> block) alongside the other controls to ensure predictable
keyboard navigation; locate the <tabstops> element in AudioInput.ui and insert
"qrbWebRTC" and "qsWebRTCAggressiveness" (or the exact object names used in the
file) in the correct logical order relative to neighboring widgets so they
receive focus when tabbing.
- Around line 420-421: The UI string for the WebRTC VAD description contains bad
grammar ("was used to detecting speech"); update the <string> content inside the
AudioInput UI where the WebRTC VAD description is defined so it reads correctly
(e.g., "This sets speech detection to use the WebRTC VAD algorithm. In this
mode, a config-free Gaussian Mixture Model is used to detect speech."),
replacing the incorrect phrase with "is used to detect speech" or the provided
revised sentence in the string literal.

In `@src/mumble/AudioWizard.cpp`:
- Line 18: The unconditional inclusion of WebRTC_Priv.h in AudioWizard.cpp
should be guarded by the same compile flag used elsewhere
(USE_WEBRTC_AUDIO_PROCESSING); wrap the include of "WebRTC_Priv.h" inside an
`#ifdef` USE_WEBRTC_AUDIO_PROCESSING / `#endif` block (matching the pattern used in
AudioInput.cpp) so the header is only included when webRTC audio processing is
enabled, keeping other references in AudioWizard.cpp unchanged.
- Around line 667-671: The handler
AudioWizard::on_qsWebRTCAggressiveness_valueChanged currently writes
Global::get().ai->m_vadWebrtcAggressiveness directly from the UI thread causing
a data race and also may not apply the new value to the VAD instance; replace
the direct write with a thread-safe setter on the AudioInput (e.g. add/use
AudioInput::setVadWebrtcAggressiveness(int) that acquires the same mutex/lock
used by the audio thread), or at minimum assign the new aggressiveness into the
AudioInput via its public API before calling updateVad() so updateVad()
recreates m_vadWebrtc with the new value and no unsynchronized access occurs.

---

Nitpick comments:
In `@src/mumble/AudioInput.h`:
- Around line 326-331: m_vadWebrtcAggressiveness is a public field modified
directly from UI code (AudioWizard.cpp/AudioConfigDialog.cpp), breaking
encapsulation and risking races because updateVad() reads it on the audio
thread; add a synchronized setter (e.g.
AudioInput::setWebRTCAggressiveness(webrtc::Vad::Aggressiveness)) that acquires
qmSpeex, updates m_vadWebrtcAggressiveness, and if m_vad == Settings::WebRTC
recreates m_vadWebrtc (via webrtc::CreateVad) or calls updateVad() so the change
is applied safely from the audio thread.

In `@src/mumble/AudioInput.ui`:
- Around line 523-545: The new SemanticSlider widget qsWebRTCAggressiveness is
missing accessibility metadata; add properties accessibleName and
accessibleDescription to match neighboring sliders (use a concise name like
"WebRTC aggressiveness" and a description such as "Sets aggressiveness of the
WebRTC voice activity detector; higher values filter more noise"), ensuring you
add these property entries to the qsWebRTCAggressiveness widget definition so
screen readers receive the same information as other sliders.

In `@src/mumble/WebRTC_Priv.h`:
- Around line 11-16: The include guard COMMON_AUDIO_VAD_INCLUDE_VAD_H_ in
WebRTC_Priv.h intentionally follows the upstream WebRTC naming convention;
update the header comment immediately above the guard to state explicitly that
this non-Mumble-style guard is deliberate because the header mirrors the
original WebRTC private header and must remain stable for compatibility (mention
COMMON_AUDIO_VAD_INCLUDE_VAD_H_ by name and WebRTC_Priv.h so future
readers/maintainers won't change it).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bb271197-711c-4baf-8e84-bf5afa888596

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1b1f3 and 47eab62.

📒 Files selected for processing (15)
  • docs/dev/build-instructions/cmake_options.md
  • src/mumble/AudioConfigDialog.cpp
  • src/mumble/AudioConfigDialog.h
  • src/mumble/AudioInput.cpp
  • src/mumble/AudioInput.h
  • src/mumble/AudioInput.ui
  • src/mumble/AudioWizard.cpp
  • src/mumble/AudioWizard.h
  • src/mumble/AudioWizard.ui
  • src/mumble/CMakeLists.txt
  • src/mumble/EnumStringConversions.cpp
  • src/mumble/Settings.h
  • src/mumble/SettingsKeys.h
  • src/mumble/SettingsMacros.h
  • src/mumble/WebRTC_Priv.h

Comment thread src/mumble/AudioConfigDialog.cpp
Comment thread src/mumble/AudioInput.cpp
Comment thread src/mumble/AudioInput.cpp Outdated
Comment thread src/mumble/AudioInput.cpp
Comment thread src/mumble/AudioInput.ui
Comment thread src/mumble/AudioInput.ui Outdated
Comment thread src/mumble/AudioWizard.cpp
Comment thread src/mumble/AudioWizard.cpp
The previous implimentation used a public variable for WebRTC aggressiveness
aggressiveness control which is too simple and bad for readability. Replace it
with a public updateWebrtcAggressiveness function.
It uses std::atomic<Settings::VADSource> to prevent potential data
race when switching VAD.
@Hartmnt Hartmnt added this to the 1.7.0 milestone Mar 22, 2026
@Hartmnt Hartmnt added client feature-request This issue or PR deals with a new feature labels Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client feature-request This issue or PR deals with a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants