Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/dev/WebRTCAEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# WebRTC AEC3 Echo Cancellation

Mumble's built-in echo cancellation uses SpeexDSP — a linear MDF filter from ~2007. It works adequately with headphones but struggles with real-world speaker setups: nonlinear distortion, room reverb, and rapidly changing acoustic conditions produce audible echo artifacts.

This optional mode adds **WebRTC AEC3**, the same algorithm used in Chrome, Teams, and Discord. It is enabled at build time with `-Dwebrtc-apm=ON` and appears in Audio Input settings as **"Echo cancellation (WebRTC AEC3)"**.

## How it works

WebRTC APM separates render (speaker) and capture (microphone) processing, which lets it bypass Mumble's existing Resynchronizer queue:

- **`addEcho()`** — speaker samples feed directly into `ProcessReverseStream()`. The `short[]` allocation and `resync.addSpeaker()` are skipped.
- **`encodeAudioFrame()`** — mic samples feed directly into `ProcessStream()` instead of the Speex path. The measured output latency (hardware path from WASAPI's `GetStreamLatency` + software buffer occupancy) is passed as the stream delay so AEC3 can align the two streams.

The existing SpeexDSP preprocessor (VAD, AGC, denoising) still runs on the cleaned signal afterward. The Resynchronizer and Speex AEC paths are untouched — `SPEEX_MIXED` and `SPEEX_MULTICHANNEL` behave exactly as before.

## Platform support

| Backend | Supported |
|---|---|
| Windows (WASAPI) | Yes |
| Linux (PulseAudio) | Yes |
| macOS | No — use `APPLE_AEC` instead |
| ALSA | No — no loopback capture available |

## Getting the dependency

`webrtc-audio-processing` (≥ 2.0, freedesktop fork) must be installed separately — it is not bundled with Mumble.

**Linux:**
```sh
# Ubuntu/Debian
sudo apt install libwebrtc-audio-processing-dev
# Fedora
sudo dnf install webrtc-audio-processing-devel
```

**Windows (vcpkg):**
```sh
vcpkg install webrtc-audio-processing:x64-windows-static-md
```

> **Note:** Mumble's vcpkg fork (`mumble-voip/vcpkg`) does not yet include this port. For now, install from upstream vcpkg and point `CMAKE_PREFIX_PATH` at your vcpkg installed tree, or build from source.

**From source:**
```sh
git clone https://gitlab.freedesktop.org/pipewire/webrtc-audio-processing.git
cd webrtc-audio-processing
meson setup build --prefix=/your/install/prefix
ninja -C build install
```

## Building Mumble with WebRTC AEC3

```sh
cmake -Dwebrtc-apm=ON -DCMAKE_PREFIX_PATH=/your/install/prefix ..
ninja mumble
```

On Windows, run this from an MSVC x64 developer environment. The DLL (`webrtc-audio-processing-2-1.dll`) is automatically copied to the build output directory by a `POST_BUILD` step.

Builds without the flag (`-Dwebrtc-apm=OFF`, the default) are unaffected — no behavior change, no new dependency.

## Related

- `src/mumble/AudioInput.cpp` — `resetAudioProcessor()`, `addMic()`, `addEcho()`, `encodeAudioFrame()`
- `src/mumble/EchoCancelOption.h` / `.cpp` — enum and option list
- `docs/dev/build-instructions/cmake_options.md` — `webrtc-apm` option reference
- `docs/dev/AudioInputDebug.md` — how to tap the DSP chain for debugging
87 changes: 76 additions & 11 deletions src/mumble/AudioInput.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,10 @@ AudioInput::~AudioInput() {
if (sesEcho)
speex_echo_state_destroy(sesEcho);

#ifdef USE_WEBRTC_APM
m_apm = nullptr;
#endif

if (srsMic)
speex_resampler_destroy(srsMic);
if (srsEcho)
Expand Down Expand Up @@ -598,7 +602,19 @@ void AudioInput::addMic(const void *data, unsigned int nsamp) {

// If we have echo cancellation enabled...
if (iEchoChannels > 0) {
resync.addMic(psMic);
#ifdef USE_WEBRTC_APM
// WebRTC path: render stream was already fed in addEcho(), process capture directly.
if (m_apm) {
encodeAudioFrame(AudioChunk(psMic));
// psMic is heap-allocated because iEchoChannels > 0 (see above).
// The Speex path hands it to the Resynchronizer which owns and frees it;
// the WebRTC path bypasses the Resynchronizer, so we free it here.
delete[] psMic;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this delete[] sane?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Yes. When iEchoChannels > 0, psMic is heap-allocated (see the new short[iFrameSize] a few lines above) so that it can outlive the stack frame for the Resynchronizer queue. In the Speex path, resync.addMic(psMic) transfers ownership and the Resynchronizer eventually calls delete[] itself. The WebRTC path bypasses the Resynchronizer entirely, so we must free it here. Added a comment to make this ownership handoff explicit.

} else
#endif
{
resync.addMic(psMic);
}
} else {
encodeAudioFrame(AudioChunk(psMic));
}
Expand Down Expand Up @@ -654,6 +670,21 @@ void AudioInput::addEcho(const void *data, unsigned int nsamp) {
speex_resampler_process_interleaved_float(srsEcho, pfEchoInput, &inlen, pfOutput, &outlen);
}

#ifdef USE_WEBRTC_APM
// WebRTC path: feed the render (speaker) stream directly to the Audio Processing Module (APM).
// The capture (mic) side calls encodeAudioFrame() from addMic() to complete the AEC loop.
// Hold qmSpeex briefly to guard against m_apm being reset concurrently in resetAudioProcessor().
{
QMutexLocker l(&qmSpeex);
if (m_apm) {
float *renderPtr = ptr;
webrtc::StreamConfig cfg(iSampleRate, 1);
m_apm->ProcessReverseStream(&renderPtr, cfg, cfg, &renderPtr);
continue;
}
}
#endif

short *outbuff = new short[iEchoFrameSize];

// float -> 16bit PCM
Expand Down Expand Up @@ -749,6 +780,11 @@ void AudioInput::resetAudioProcessor() {

if (sesEcho)
speex_echo_state_destroy(sesEcho);
sesEcho = nullptr; // Null immediately after destroy to prevent a dangling pointer on re-entry.

#ifdef USE_WEBRTC_APM
m_apm = nullptr;
#endif

m_preprocessor.init(iSampleRate, iFrameSize);
resync.reset();
Expand All @@ -769,16 +805,29 @@ void AudioInput::resetAudioProcessor() {
}

if (iEchoChannels > 0) {
int filterSize = iFrameSize * (10 + resync.getNominalLag());
sesEcho =
speex_echo_state_init_mc(iFrameSize, filterSize, 1, bEchoMulti ? static_cast< int >(iEchoChannels) : 1);
int iArg = iSampleRate;
speex_echo_ctl(sesEcho, SPEEX_ECHO_SET_SAMPLING_RATE, &iArg);
m_preprocessor.setEchoState(sesEcho);

qWarning("AudioInput: ECHO CANCELLER ACTIVE");
} else {
sesEcho = nullptr;
#ifdef USE_WEBRTC_APM
if (Global::get().s.echoOption == EchoCancelOptionID::WEBRTC_AEC) {
webrtc::AudioProcessing::Config cfg;
cfg.echo_canceller.enabled = true;
cfg.echo_canceller.mobile_mode = false;
m_apm = webrtc::AudioProcessingBuilder().SetConfig(cfg).Create();
if (m_apm) {
qWarning("AudioInput: WebRTC AEC3 ACTIVE");
} else {
qWarning("AudioInput: Failed to create WebRTC APM, echo cancellation disabled");
}
} else
#endif
{
int filterSize = iFrameSize * (10 + resync.getNominalLag());
sesEcho = speex_echo_state_init_mc(iFrameSize, filterSize, 1,
bEchoMulti ? static_cast< int >(iEchoChannels) : 1);
int iArg = iSampleRate;
speex_echo_ctl(sesEcho, SPEEX_ECHO_SET_SAMPLING_RATE, &iArg);
m_preprocessor.setEchoState(sesEcho);

qWarning("AudioInput: ECHO CANCELLER ACTIVE");
}
}

bResetEncoder = true;
Expand Down Expand Up @@ -900,6 +949,22 @@ void AudioInput::encodeAudioFrame(AudioChunk chunk) {
}

short psClean[iFrameSize];
#ifdef USE_WEBRTC_APM
if (m_apm) {
// WebRTC APM works in float [-1.0, 1.0]; convert to/from int16 PCM [-32768, 32767].
static constexpr float kInt16Scale = 32768.f;
float floatBuf[iFrameSize];
for (int i = 0; i < iFrameSize; ++i)
floatBuf[i] = chunk.mic[i] / kInt16Scale;
float *floatPtr = floatBuf;
webrtc::StreamConfig cfg(iSampleRate, 1);
m_apm->set_stream_delay_ms(Global::get().iOutputLatencyMs.load());
m_apm->ProcessStream(&floatPtr, cfg, cfg, &floatPtr);
for (int i = 0; i < iFrameSize; ++i)
psClean[i] = static_cast< short >(qBound(-kInt16Scale, floatBuf[i] * kInt16Scale, kInt16Scale - 1.f));
psSource = psClean;
} else
#endif
if (sesEcho && chunk.speaker) {
speex_echo_cancellation(sesEcho, chunk.mic, chunk.speaker, psClean);
psSource = psClean;
Expand Down
14 changes: 14 additions & 0 deletions src/mumble/AudioInput.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,17 @@
#include <speex/speex_echo.h>
#include <speex/speex_resampler.h>

#ifdef USE_WEBRTC_APM
# ifdef _MSC_VER
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to ignore warnings? Worthy of a comment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Good catch — added a comment. The WebRTC APM headers emit warnings we can't fix (they're third-party code), so we suppress them for that include block only and restore the warning level immediately after.

// webrtc-audio-processing headers emit warnings we can't fix (third-party code).
# pragma warning(push, 0)
# endif
# include <modules/audio_processing/include/audio_processing.h>
# ifdef _MSC_VER
# pragma warning(pop)
# endif
#endif

#include "Audio.h"
#include "AudioOutputToken.h"
#include "AudioPreprocessor.h"
Expand Down Expand Up @@ -225,6 +236,9 @@ class AudioInput : public QThread {
QMutex qmSpeex;
AudioPreprocessor m_preprocessor;
SpeexEchoState *sesEcho;
#ifdef USE_WEBRTC_APM
rtc::scoped_refptr< webrtc::AudioProcessing > m_apm;
#endif

/// bResetEncoder is a flag that notifies
/// our encoder functions that the encoder
Expand Down
61 changes: 61 additions & 0 deletions src/mumble/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ option(bundled-speex "Build the included version of Speex instead of looking for
option(rnnoise "Use RNNoise for machine learning noise reduction." ON)
option(bundled-rnnoise "Build the included version of RNNoise instead of looking for one on the system." ${rnnoise})

option(webrtc-apm "Use WebRTC AEC3 for echo cancellation via webrtc-audio-processing." OFF)

option(manual-plugin "Include the built-in \"manual\" positional audio plugin." ON)

option(qtspeech "Use Qt's text-to-speech system (part of the Qt Speech module) instead of Mumble's own OS-specific text-to-speech implementations." OFF)
Expand Down Expand Up @@ -787,6 +789,65 @@ if(rnnoise)
endif()
endif()

if(webrtc-apm)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is available in vcpkg, so we'll need to vendor-in the library.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Agreed — this is a blocker for official release builds. Mumble's vcpkg fork (mumble-voip/vcpkg) doesn't have a webrtc-audio-processing port yet. The options are: (a) vendor the library source into the repo, (b) add a port to the vcpkg fork, or (c) use CMake FetchContent. What's the project's preferred approach? In the meantime the CMakeLists.txt already falls back to a manual find_path/find_library search for local installs, so developer builds work.

target_compile_definitions(mumble_client_object_lib PRIVATE "USE_WEBRTC_APM")

# Try pkg-config first (Linux), then fall back to find_path/find_library (Windows/macOS).
# The installed include dir is <prefix>/include/webrtc-audio-processing-{1,2}/
# Headers are then included as <modules/audio_processing/include/audio_processing.h>.
find_pkg("webrtc-audio-processing-2;webrtc-audio-processing-1;webrtc-audio-processing")

set(WEBRTC_APM_FOUND FALSE)
foreach(_wap_name webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing)
if(${_wap_name}_FOUND)
target_include_directories(mumble_client_object_lib PRIVATE ${${_wap_name}_INCLUDE_DIRS})
target_link_libraries(mumble_client_object_lib PRIVATE ${${_wap_name}_LIBRARIES})
set(WEBRTC_APM_FOUND TRUE)
break()
endif()
endforeach()

if(NOT WEBRTC_APM_FOUND)
# pkg-config not available (e.g. Windows). Search manually via CMAKE_PREFIX_PATH.
# Supports both version 1.x and 2.x install layouts.
find_path(WEBRTC_APM_INCLUDE_DIR
NAMES modules/audio_processing/include/audio_processing.h
PATH_SUFFIXES webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing
)
find_library(WEBRTC_APM_LIBRARY
NAMES webrtc-audio-processing-2 webrtc-audio-processing-1 webrtc-audio-processing
)

if(NOT WEBRTC_APM_INCLUDE_DIR OR NOT WEBRTC_APM_LIBRARY)
message(FATAL_ERROR "webrtc-audio-processing not found. "
"Set CMAKE_PREFIX_PATH to the install prefix (e.g. C:/Users/ben/Projects/webrtc-apm-prefix).")
endif()

target_include_directories(mumble_client_object_lib PRIVATE "${WEBRTC_APM_INCLUDE_DIR}")
target_link_libraries(mumble_client_object_lib PRIVATE "${WEBRTC_APM_LIBRARY}")
target_compile_definitions(mumble_client_object_lib PRIVATE WEBRTC_WIN _WIN32 NOMINMAX _USE_MATH_DEFINES)
endif()

if(WIN32)
find_file(WEBRTC_APM_DLL
NAMES webrtc-audio-processing-2-1.dll webrtc-audio-processing-1.dll webrtc-audio-processing.dll
PATH_SUFFIXES bin
)
if(WEBRTC_APM_DLL)
add_custom_command(TARGET mumble POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different
"${WEBRTC_APM_DLL}"
"$<TARGET_FILE_DIR:mumble>"
COMMENT "Copying WebRTC APM DLL"
)
else()
message(WARNING "webrtc-audio-processing DLL not found — copy it manually to the build directory.")
endif()
endif()

message(STATUS "WebRTC APM (AEC3) support enabled")
endif()

if(qtspeech)
find_pkg(Qt6 COMPONENTS TextToSpeech REQUIRED)
target_sources(mumble_client_object_lib PRIVATE "TextToSpeech.cpp")
Expand Down
12 changes: 9 additions & 3 deletions src/mumble/EchoCancelOption.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,15 @@ const std::vector< EchoCancelOption > &EchoCancelOption::getOptions() {
"Multichannel echo cancellation requires more CPU, so "
"you should try mixed first.") },
// Available only on Apple devices
{ EchoCancelOptionID::APPLE_AEC, QObject::tr("EXPERIMENTAL: Acoustic echo cancellation (Apple)."),
QObject::tr("The support for this option is experimental only! This option works best when using built-in "
"microphone and speaker.") }
{ EchoCancelOptionID::APPLE_AEC, QObject::tr("Acoustic echo cancellation (Apple)"),
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't modify the "experimental" tag on the Apple noise cancellation unless we've improved it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Fixed — reverted to DISABLED. That change was unintentional; issue #4912 was not verified fixed so the original conservative default should stand.

QObject::tr("Uses Apple's built-in voice processing for echo cancellation. Works best with built-in "
"microphone and speaker.") },
#ifdef USE_WEBRTC_APM
// Available when built with webrtc-audio-processing (-Dwebrtc-apm=ON)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should gate this flag so it only appears when the feature is available.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Fixed. The getOptions() vector entry is now wrapped in #ifdef USE_WEBRTC_APM, so the option only appears in the UI when the library was compiled in. The enum value itself is left ungarded — see the reply on EchoCancelOption.h for why.

{ EchoCancelOptionID::WEBRTC_AEC, QObject::tr("Echo cancellation (WebRTC AEC3)"),
QObject::tr("Uses the WebRTC AEC3 algorithm for high-quality echo cancellation. "
"Recommended for use with speakers instead of headphones.") },
#endif
};

return echoCancelOptions;
Expand Down
5 changes: 4 additions & 1 deletion src/mumble/EchoCancelOption.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ enum class EchoCancelOptionID {
DISABLED = 0,
SPEEX_MIXED = 1,
SPEEX_MULTICHANNEL = 2,
APPLE_AEC = 3 // Apple's Acoustic Echo Cancellation support for macOS and iOS.
APPLE_AEC = 3, // Apple's Acoustic Echo Cancellation support for macOS and iOS.
// Always defined (even without USE_WEBRTC_APM) so saved config values round-trip cleanly.
// The corresponding UI entry in getOptions() is compiled out when USE_WEBRTC_APM is not set.
WEBRTC_AEC = 4 // WebRTC AEC3 via the webrtc-audio-processing library.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth an #ifdef guard

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: The enum value is intentionally left ungarded. AudioConfigDialog uses the enum value as a direct index into the getOptions() vector, so the vector must be indexable by value 4 whenever WEBRTC_AEC appears in a backend's echoOptions. Since those push_back calls are already behind #ifdef USE_WEBRTC_APM, index 4 is never accessed in a build without the feature — so garding the enum itself would give no extra safety while breaking serialization (a config saved by a USE_WEBRTC_APM build would fail to parse on one without it). Added a comment to the enum to explain this.

};

struct EchoCancelOption {
Expand Down
3 changes: 2 additions & 1 deletion src/mumble/EnumStringConversions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@
PROCESS(EchoCancelOptionID, DISABLED, "Disabled") \
PROCESS(EchoCancelOptionID, SPEEX_MIXED, "Speex_MixedChannel") \
PROCESS(EchoCancelOptionID, SPEEX_MULTICHANNEL, "Speex_Multichannel") \
PROCESS(EchoCancelOptionID, APPLE_AEC, "Apple_AEC")
PROCESS(EchoCancelOptionID, APPLE_AEC, "Apple_AEC") \
PROCESS(EchoCancelOptionID, WEBRTC_AEC, "WebRTC_AEC3")

#define PROXY_TYPE_VALUES \
PROCESS(Settings::ProxyType, NoProxy, "None") \
Expand Down
7 changes: 7 additions & 0 deletions src/mumble/Global.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "Timer.h"
#include "Version.h"

#include <atomic>
#include <memory>

// Global helper class to spread variables around across threads.
Expand Down Expand Up @@ -94,6 +95,12 @@ struct Global Q_DECL_FINAL {
ChanACL::Permissions pPermissions;
int iMaxBandwidth;
int iAudioBandwidth;
/// End-to-end output latency in ms (hardware path + software buffer occupancy), used by WebRTC
/// AEC3 to align the render and capture streams for echo cancellation.
/// AudioInput and AudioOutput are decoupled — they cannot call methods on each other — so
/// Global is the appropriate channel for this runtime-measured value (same pattern as
/// iAudioPathTime). Set by the audio output backend after stream init; 50ms is a safe default.
std::atomic< int > iOutputLatencyMs{ 50 };
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a new global for latency seems suspicious. Is there no other mechanism?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: AudioInput and AudioOutput are fully decoupled — they can't call methods on each other directly. Global is the established pattern in this codebase for cross-thread runtime state that both sides need (same pattern as iAudioPathTime). The field is std::atomic<int> so reads and writes are safe across threads. Expanded the comment to make this rationale explicit.

QDir qdBasePath;
bool bAttenuateOthers;
/// If set the AudioOutput::mix will forcefully adjust the volume of all
Expand Down
14 changes: 12 additions & 2 deletions src/mumble/PulseAudio.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -911,6 +911,9 @@ void PulseAudioSystem::contextCallback(pa_context *c) {
PulseAudioInputRegistrar::PulseAudioInputRegistrar() : AudioInputRegistrar(QLatin1String("PulseAudio"), 10) {
echoOptions.push_back(EchoCancelOptionID::SPEEX_MIXED);
echoOptions.push_back(EchoCancelOptionID::SPEEX_MULTICHANNEL);
#ifdef USE_WEBRTC_APM
echoOptions.push_back(EchoCancelOptionID::WEBRTC_AEC);
#endif
}

AudioInput *PulseAudioInputRegistrar::create() {
Expand Down Expand Up @@ -939,8 +942,15 @@ void PulseAudioInputRegistrar::setDeviceChoice(const QVariant &choice, Settings
}

bool PulseAudioInputRegistrar::canEcho(EchoCancelOptionID echoOption, const QString &osys) const {
return (echoOption == EchoCancelOptionID::SPEEX_MIXED || echoOption == EchoCancelOptionID::SPEEX_MULTICHANNEL)
&& (osys == name);
if (osys != name)
return false;
if (echoOption == EchoCancelOptionID::SPEEX_MIXED || echoOption == EchoCancelOptionID::SPEEX_MULTICHANNEL)
return true;
#ifdef USE_WEBRTC_APM
if (echoOption == EchoCancelOptionID::WEBRTC_AEC)
return true;
#endif
return false;
}

PulseAudioOutputRegistrar::PulseAudioOutputRegistrar() : AudioOutputRegistrar(QLatin1String("PulseAudio"), 10) {
Expand Down
Loading
Loading