-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[DEMO] feat: echo cancellation with WebRTC #7180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
ab36224
4b7f32e
3925ff3
141e160
e554962
1111c66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| # WebRTC AEC3 Echo Cancellation | ||
|
|
||
| Mumble's built-in echo cancellation uses SpeexDSP — a linear MDF filter from ~2007. It works adequately with headphones but struggles with real-world speaker setups: nonlinear distortion, room reverb, and rapidly changing acoustic conditions produce audible echo artifacts. | ||
|
|
||
| This optional mode adds **WebRTC AEC3**, the same algorithm used in Chrome, Teams, and Discord. It is enabled at build time with `-Dwebrtc-apm=ON` and appears in Audio Input settings as **"Echo cancellation (WebRTC AEC3)"**. | ||
|
|
||
| ## How it works | ||
|
|
||
| WebRTC APM separates render (speaker) and capture (microphone) processing, which lets it bypass Mumble's existing Resynchronizer queue: | ||
|
|
||
| - **`addEcho()`** — speaker samples feed directly into `ProcessReverseStream()`. The `short[]` allocation and `resync.addSpeaker()` are skipped. | ||
| - **`encodeAudioFrame()`** — mic samples feed directly into `ProcessStream()` instead of the Speex path. The measured output latency (hardware path from WASAPI's `GetStreamLatency` + software buffer occupancy) is passed as the stream delay so AEC3 can align the two streams. | ||
|
|
||
| The existing SpeexDSP preprocessor (VAD, AGC, denoising) still runs on the cleaned signal afterward. The Resynchronizer and Speex AEC paths are untouched — `SPEEX_MIXED` and `SPEEX_MULTICHANNEL` behave exactly as before. | ||
|
|
||
| ## Platform support | ||
|
|
||
| | Backend | Supported | | ||
| |---|---| | ||
| | Windows (WASAPI) | Yes | | ||
| | Linux (PulseAudio) | Yes | | ||
| | macOS | No — use `APPLE_AEC` instead | | ||
| | ALSA | No — no loopback capture available | | ||
|
|
||
| ## Getting the dependency | ||
|
|
||
| `webrtc-audio-processing` (≥ 2.0, freedesktop fork) must be installed separately — it is not bundled with Mumble. | ||
|
|
||
| **Linux:** | ||
| ```sh | ||
| # Ubuntu/Debian | ||
| sudo apt install libwebrtc-audio-processing-dev | ||
| # Fedora | ||
| sudo dnf install webrtc-audio-processing-devel | ||
| ``` | ||
|
|
||
| **Windows (vcpkg):** | ||
| ```sh | ||
| vcpkg install webrtc-audio-processing:x64-windows-static-md | ||
| ``` | ||
|
|
||
| > **Note:** Mumble's vcpkg fork (`mumble-voip/vcpkg`) does not yet include this port. For now, install from upstream vcpkg and point `CMAKE_PREFIX_PATH` at your vcpkg installed tree, or build from source. | ||
|
|
||
| **From source:** | ||
| ```sh | ||
| git clone https://gitlab.freedesktop.org/pipewire/webrtc-audio-processing.git | ||
| cd webrtc-audio-processing | ||
| meson setup build --prefix=/your/install/prefix | ||
| ninja -C build install | ||
| ``` | ||
|
|
||
| ## Building Mumble with WebRTC AEC3 | ||
|
|
||
| ```sh | ||
| cmake -Dwebrtc-apm=ON -DCMAKE_PREFIX_PATH=/your/install/prefix .. | ||
| ninja mumble | ||
| ``` | ||
|
|
||
| On Windows, run this from an MSVC x64 developer environment. The DLL (`webrtc-audio-processing-2-1.dll`) is automatically copied to the build output directory by a `POST_BUILD` step. | ||
|
|
||
| Builds without the flag (`-Dwebrtc-apm=OFF`, the default) are unaffected — no behavior change, no new dependency. | ||
|
|
||
| ## Related | ||
|
|
||
| - `src/mumble/AudioInput.cpp` — `resetAudioProcessor()`, `addMic()`, `addEcho()`, `encodeAudioFrame()` | ||
| - `src/mumble/EchoCancelOption.h` / `.cpp` — enum and option list | ||
| - `docs/dev/build-instructions/cmake_options.md` — `webrtc-apm` option reference | ||
| - `docs/dev/AudioInputDebug.md` — how to tap the DSP chain for debugging |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,17 @@ | |
| #include <speex/speex_echo.h> | ||
| #include <speex/speex_resampler.h> | ||
|
|
||
| #ifdef USE_WEBRTC_APM | ||
| # ifdef _MSC_VER | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to ignore warnings? Worthy of a comment.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Good catch — added a comment. The WebRTC APM headers emit warnings we can't fix (they're third-party code), so we suppress them for that include block only and restore the warning level immediately after. |
||
| // webrtc-audio-processing headers emit warnings we can't fix (third-party code). | ||
| # pragma warning(push, 0) | ||
| # endif | ||
| # include <modules/audio_processing/include/audio_processing.h> | ||
| # ifdef _MSC_VER | ||
| # pragma warning(pop) | ||
| # endif | ||
| #endif | ||
|
|
||
| #include "Audio.h" | ||
| #include "AudioOutputToken.h" | ||
| #include "AudioPreprocessor.h" | ||
|
|
@@ -225,6 +236,9 @@ class AudioInput : public QThread { | |
| QMutex qmSpeex; | ||
| AudioPreprocessor m_preprocessor; | ||
| SpeexEchoState *sesEcho; | ||
| #ifdef USE_WEBRTC_APM | ||
| rtc::scoped_refptr< webrtc::AudioProcessing > m_apm; | ||
| #endif | ||
|
|
||
| /// bResetEncoder is a flag that notifies | ||
| /// our encoder functions that the encoder | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,6 +24,8 @@ option(bundled-speex "Build the included version of Speex instead of looking for | |
| option(rnnoise "Use RNNoise for machine learning noise reduction." ON) | ||
| option(bundled-rnnoise "Build the included version of RNNoise instead of looking for one on the system." ${rnnoise}) | ||
|
|
||
| option(webrtc-apm "Use WebRTC AEC3 for echo cancellation via webrtc-audio-processing." OFF) | ||
|
|
||
| option(manual-plugin "Include the built-in \"manual\" positional audio plugin." ON) | ||
|
|
||
| option(qtspeech "Use Qt's text-to-speech system (part of the Qt Speech module) instead of Mumble's own OS-specific text-to-speech implementations." OFF) | ||
|
|
@@ -787,6 +789,65 @@ if(rnnoise) | |
| endif() | ||
| endif() | ||
|
|
||
| if(webrtc-apm) | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is available in vcpkg, so we'll need to vendor-in the library.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Agreed — this is a blocker for official release builds. Mumble's vcpkg fork ( |
||
| target_compile_definitions(mumble_client_object_lib PRIVATE "USE_WEBRTC_APM") | ||
|
|
||
| # Try pkg-config first (Linux), then fall back to find_path/find_library (Windows/macOS). | ||
| # The installed include dir is <prefix>/include/webrtc-audio-processing-{1,2}/ | ||
| # Headers are then included as <modules/audio_processing/include/audio_processing.h>. | ||
| find_pkg("webrtc-audio-processing-2;webrtc-audio-processing-1;webrtc-audio-processing") | ||
|
|
||
| set(WEBRTC_APM_FOUND FALSE) | ||
| foreach(_wap_name webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing) | ||
| if(${_wap_name}_FOUND) | ||
| target_include_directories(mumble_client_object_lib PRIVATE ${${_wap_name}_INCLUDE_DIRS}) | ||
| target_link_libraries(mumble_client_object_lib PRIVATE ${${_wap_name}_LIBRARIES}) | ||
| set(WEBRTC_APM_FOUND TRUE) | ||
| break() | ||
| endif() | ||
| endforeach() | ||
|
|
||
| if(NOT WEBRTC_APM_FOUND) | ||
| # pkg-config not available (e.g. Windows). Search manually via CMAKE_PREFIX_PATH. | ||
| # Supports both version 1.x and 2.x install layouts. | ||
| find_path(WEBRTC_APM_INCLUDE_DIR | ||
| NAMES modules/audio_processing/include/audio_processing.h | ||
| PATH_SUFFIXES webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing | ||
| ) | ||
| find_library(WEBRTC_APM_LIBRARY | ||
| NAMES webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing | ||
| ) | ||
|
|
||
| if(NOT WEBRTC_APM_INCLUDE_DIR OR NOT WEBRTC_APM_LIBRARY) | ||
| message(FATAL_ERROR "webrtc-audio-processing not found. " | ||
| "Set CMAKE_PREFIX_PATH to the install prefix (e.g. C:/Users/ben/Projects/webrtc-apm-prefix).") | ||
| endif() | ||
|
|
||
| target_include_directories(mumble_client_object_lib PRIVATE "${WEBRTC_APM_INCLUDE_DIR}") | ||
| target_link_libraries(mumble_client_object_lib PRIVATE "${WEBRTC_APM_LIBRARY}") | ||
| target_compile_definitions(mumble_client_object_lib PRIVATE WEBRTC_WIN _WIN32 NOMINMAX _USE_MATH_DEFINES) | ||
| endif() | ||
|
|
||
| if(WIN32) | ||
| find_file(WEBRTC_APM_DLL | ||
| NAMES webrtc-audio-processing-2-1.dll webrtc-audio-processing-1.dll webrtc-audio-processing.dll | ||
| PATH_SUFFIXES bin | ||
| ) | ||
| if(WEBRTC_APM_DLL) | ||
| add_custom_command(TARGET mumble POST_BUILD | ||
| COMMAND ${CMAKE_COMMAND} -E copy_if_different | ||
| "${WEBRTC_APM_DLL}" | ||
| "$<TARGET_FILE_DIR:mumble>" | ||
| COMMENT "Copying WebRTC APM DLL" | ||
| ) | ||
| else() | ||
| message(WARNING "webrtc-audio-processing DLL not found — copy it manually to the build directory.") | ||
| endif() | ||
| endif() | ||
|
|
||
| message(STATUS "WebRTC APM (AEC3) support enabled") | ||
| endif() | ||
|
|
||
| if(qtspeech) | ||
| find_pkg(Qt6 COMPONENTS TextToSpeech REQUIRED) | ||
| target_sources(mumble_client_object_lib PRIVATE "TextToSpeech.cpp") | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,9 +25,15 @@ const std::vector< EchoCancelOption > &EchoCancelOption::getOptions() { | |
| "Multichannel echo cancellation requires more CPU, so " | ||
| "you should try mixed first.") }, | ||
| // Available only on Apple devices | ||
| { EchoCancelOptionID::APPLE_AEC, QObject::tr("EXPERIMENTAL: Acoustic echo cancellation (Apple)."), | ||
| QObject::tr("The support for this option is experimental only! This option works best when using built-in " | ||
| "microphone and speaker.") } | ||
| { EchoCancelOptionID::APPLE_AEC, QObject::tr("Acoustic echo cancellation (Apple)"), | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shouldn't modify the "experimental" tag on the Apple noise cancellation unless we've improved it.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Fixed — reverted to |
||
| QObject::tr("Uses Apple's built-in voice processing for echo cancellation. Works best with built-in " | ||
| "microphone and speaker.") }, | ||
| #ifdef USE_WEBRTC_APM | ||
| // Available when built with webrtc-audio-processing (-Dwebrtc-apm=ON) | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should gate this flag so it only appears when the feature is available.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: Fixed. The |
||
| { EchoCancelOptionID::WEBRTC_AEC, QObject::tr("Echo cancellation (WebRTC AEC3)"), | ||
| QObject::tr("Uses the WebRTC AEC3 algorithm for high-quality echo cancellation. " | ||
| "Recommended for use with speakers instead of headphones.") }, | ||
| #endif | ||
| }; | ||
|
|
||
| return echoCancelOptions; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,10 @@ enum class EchoCancelOptionID { | |
| DISABLED = 0, | ||
| SPEEX_MIXED = 1, | ||
| SPEEX_MULTICHANNEL = 2, | ||
| APPLE_AEC = 3 // Apple's Acoustic Echo Cancellation support for macOS and iOS. | ||
| APPLE_AEC = 3, // Apple's Acoustic Echo Cancellation support for macOS and iOS. | ||
| // Always defined (even without USE_WEBRTC_APM) so saved config values round-trip cleanly. | ||
| // The corresponding UI entry in getOptions() is compiled out when USE_WEBRTC_APM is not set. | ||
| WEBRTC_AEC = 4 // WebRTC AEC3 via the webrtc-audio-processing library. | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth an #ifdef guard
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: The enum value is intentionally left ungarded. |
||
| }; | ||
|
|
||
| struct EchoCancelOption { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ | |
| #include "Timer.h" | ||
| #include "Version.h" | ||
|
|
||
| #include <atomic> | ||
| #include <memory> | ||
|
|
||
| // Global helper class to spread variables around across threads. | ||
|
|
@@ -94,6 +95,12 @@ struct Global Q_DECL_FINAL { | |
| ChanACL::Permissions pPermissions; | ||
| int iMaxBandwidth; | ||
| int iAudioBandwidth; | ||
| /// End-to-end output latency in ms (hardware path + software buffer occupancy), used by WebRTC | ||
| /// AEC3 to align the render and capture streams for echo cancellation. | ||
| /// AudioInput and AudioOutput are decoupled — they cannot call methods on each other — so | ||
| /// Global is the appropriate channel for this runtime-measured value (same pattern as | ||
| /// iAudioPathTime). Set by the audio output backend after stream init; 50ms is a safe default. | ||
| std::atomic< int > iOutputLatencyMs{ 50 }; | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Introducing a new global for latency seems suspicious. Is there no other mechanism?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude: |
||
| QDir qdBasePath; | ||
| bool bAttenuateOthers; | ||
| /// If set the AudioOutput::mix will forcefully adjust the volume of all | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this
delete[]sane?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude: Yes. When
iEchoChannels > 0,psMicis heap-allocated (see thenew short[iFrameSize]a few lines above) so that it can outlive the stack frame for the Resynchronizer queue. In the Speex path,resync.addMic(psMic)transfers ownership and the Resynchronizer eventually callsdelete[]itself. The WebRTC path bypasses the Resynchronizer entirely, so we must free it here. Added a comment to make this ownership handoff explicit.