HRTF proof of concept (warning: slop)#1
Open
hiinaspace wants to merge 9 commits intomasterfrom
Open
Conversation
FFTConvolver (MIT) provides real-time partitioned convolution via overlap-add FFT. Used for HRTF impulse response convolution. Only FFTConvolver.cpp, AudioFFT.cpp, and Utilities.cpp are compiled; the rest of the submodule is unused.
Uses libmysofa for SOFA HRTF file loading with bilinear interpolation between measured directions, and FFTConvolver for real-time overlap-add convolution. Per-source state is tracked via a double-buffered ConvolverSet ping-pong that crossfades over one block (~10 ms) on direction change, preserving convolver overlap-add accumulators for click-free transitions. Direction vectors are normalized at entry; zero-length vectors (source coincident with listener) fall back to forward (+Z) to avoid crossfade thrashing. Listener-local coordinates map to SOFA Cartesian as: sofaX = dirZ, sofaY = -dirX, sofaZ = dirY.
CIPIC Subject 124 (MIT KEMAR mannequin), distributed under a permissive academic license. 558-sample HRIR at 44.1 kHz; libmysofa resamples to the mixer rate at load time. This is the same dataset used by Steam Audio as its default HRTF.
Six test cases covering: - SOFA file loading (default path resolution) - Cardinal direction HRIR distinctness (catches coordinate mapping bugs) - Impulse convolution energy bounds - Numerical stability over 1000 random directions - Left/right ear asymmetry for off-axis sources - Zero direction vector stability (coincident source/listener) - Variable frameCount crossfade direction tracking
Adds optional `hrtf` build flag (default ON). When enabled: - Detects libmysofa via find_pkg() with pkg-config fallback - Compiles FFTConvolver sources from the vendored submodule - Sets USE_HRTF compile definition (guards all HRTF code) - Bakes source-tree SOFA path as MUMBLE_HRTF_DEFAULT_SOFA for dev builds; installed builds use QStandardPaths::AppDataLocation - Registers TestHrtfSpatializer in the test suite
Adds the infrastructure hooks needed for HRTF binaural mixing, with no change to audio behaviour. All additions are guarded by USE_HRTF. - AudioOutput.h: include HrtfSpatializer.h and declare m_hrtfSpatializer - removeBuffer(): capture pointer-as-sourceId before delete so the HRTF per-source convolver state can be released alongside the buffer - initializeMixer(): construct and load the HrtfSpatializer - mix(): lift cameraDir/cameraAxis/right into outer-scope hrtfCamera* variables so they are accessible in the per-buffer loop below The per-buffer mixing loop is not yet modified; the HRTF convolution path is added in the following commit.
Settings fields bHrtf (bool, default false) and qsHrtfFile (QString, default empty = use bundled KEMAR SOFA) are added unconditionally to the POSITIONAL_AUDIO_SETTINGS macro block. The UI widgets (checkbox, line edit, browse button) are hidden at runtime when USE_HRTF is not defined, so the settings file format is stable regardless of build flags.
When USE_HRTF is defined and the user has enabled HRTF in settings,
the positional audio path replaces per-channel gain+ITD with binaural
convolution:
- Listener-local direction computed from the hrtfCamera* vectors saved
in the preceding structural commit
- Opus stereo downmixed to mono before spatialisation
- calcGain(1.0f, len) provides distance-only attenuation; the HRTF IR
encodes all directional cues (ILD/ITD)
- Gain ramped linearly across each block (matching the non-HRTF path)
to avoid clicks on rapid distance changes
- Buffer pointer address used as sourceId, unique for all source types
The existing per-channel gain+ITD block is retained verbatim as the
non-HRTF fallback inside an `#ifdef / } else { / #endif` bracket so
that the lines themselves are unchanged in this diff. Use `git diff -w`
to verify that only new code was added and no logic was modified.
Owner
Author
|
Upstream PR opened mumble-voip#7085 . Leaving this one around for now since 157be04 has the prompt/design doc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements HRTF-based spatialization (mumble-voip#6597) as a separate option from the current panning and delay method:
When enabled, HRTF sofa files are parsed and interpolated using libmysofa (with a public domain default), and the resulting impulse responses are convolved with the mono audio using FFTConvolver.
I tested this manually talking to myself with the manual location plugin and it seems to work. I only tested the build on linux so far.
Some edge cases and the double-buffered design are adapted from Steam Audio, and it uses the same default HRTF. So in theory this sounds the same as steam audio's binaural spatialization. However, the distance fade is still the current implementation (not steam audio's more complex frequency-dependent distance attenuation). Nor is there reverb or geometry attenuation or doppler shift. Steam-audio itself has all that stuff, so it would also be possible to use it as a library. For just HRTF though, it seemed easier to reimplement.
Warning: While I guided and reviewed everything, the design doc and code is all AI-generated, including the tests. I think I have pretty good software engineering taste, but I am not a c++ dev or familiar with mumble.