Skip to content

HRTF proof of concept (warning: slop)#1

Open
hiinaspace wants to merge 9 commits intomasterfrom
hrtf-poc
Open

HRTF proof of concept (warning: slop)#1
hiinaspace wants to merge 9 commits intomasterfrom
hrtf-poc

Conversation

@hiinaspace
Copy link
Copy Markdown
Owner

Implements HRTF-based spatialization (mumble-voip#6597) as a separate option from the current panning and delay method:

image

When enabled, HRTF sofa files are parsed and interpolated using libmysofa (with a public domain default), and the resulting impulse responses are convolved with the mono audio using FFTConvolver.

I tested this manually talking to myself with the manual location plugin and it seems to work. I only tested the build on linux so far.

Some edge cases and the double-buffered design are adapted from Steam Audio, and it uses the same default HRTF. So in theory this sounds the same as steam audio's binaural spatialization. However, the distance fade is still the current implementation (not steam audio's more complex frequency-dependent distance attenuation). Nor is there reverb or geometry attenuation or doppler shift. Steam-audio itself has all that stuff, so it would also be possible to use it as a library. For just HRTF though, it seemed easier to reimplement.

Warning: While I guided and reviewed everything, the design doc and code is all AI-generated, including the tests. I think I have pretty good software engineering taste, but I am not a c++ dev or familiar with mumble.

FFTConvolver (MIT) provides real-time partitioned convolution via
overlap-add FFT. Used for HRTF impulse response convolution. Only
FFTConvolver.cpp, AudioFFT.cpp, and Utilities.cpp are compiled;
the rest of the submodule is unused.
Uses libmysofa for SOFA HRTF file loading with bilinear interpolation
between measured directions, and FFTConvolver for real-time overlap-add
convolution. Per-source state is tracked via a double-buffered
ConvolverSet ping-pong that crossfades over one block (~10 ms) on
direction change, preserving convolver overlap-add accumulators for
click-free transitions.

Direction vectors are normalized at entry; zero-length vectors (source
coincident with listener) fall back to forward (+Z) to avoid crossfade
thrashing. Listener-local coordinates map to SOFA Cartesian as:
  sofaX = dirZ, sofaY = -dirX, sofaZ = dirY.
CIPIC Subject 124 (MIT KEMAR mannequin), distributed under a
permissive academic license. 558-sample HRIR at 44.1 kHz;
libmysofa resamples to the mixer rate at load time. This is the
same dataset used by Steam Audio as its default HRTF.
Six test cases covering:
- SOFA file loading (default path resolution)
- Cardinal direction HRIR distinctness (catches coordinate mapping bugs)
- Impulse convolution energy bounds
- Numerical stability over 1000 random directions
- Left/right ear asymmetry for off-axis sources
- Zero direction vector stability (coincident source/listener)
- Variable frameCount crossfade direction tracking
Adds optional `hrtf` build flag (default ON). When enabled:
- Detects libmysofa via find_pkg() with pkg-config fallback
- Compiles FFTConvolver sources from the vendored submodule
- Sets USE_HRTF compile definition (guards all HRTF code)
- Bakes source-tree SOFA path as MUMBLE_HRTF_DEFAULT_SOFA for
  dev builds; installed builds use QStandardPaths::AppDataLocation
- Registers TestHrtfSpatializer in the test suite
Adds the infrastructure hooks needed for HRTF binaural mixing, with no
change to audio behaviour. All additions are guarded by USE_HRTF.

- AudioOutput.h: include HrtfSpatializer.h and declare m_hrtfSpatializer
- removeBuffer(): capture pointer-as-sourceId before delete so the HRTF
  per-source convolver state can be released alongside the buffer
- initializeMixer(): construct and load the HrtfSpatializer
- mix(): lift cameraDir/cameraAxis/right into outer-scope hrtfCamera*
  variables so they are accessible in the per-buffer loop below

The per-buffer mixing loop is not yet modified; the HRTF convolution
path is added in the following commit.
Settings fields bHrtf (bool, default false) and qsHrtfFile (QString,
default empty = use bundled KEMAR SOFA) are added unconditionally to
the POSITIONAL_AUDIO_SETTINGS macro block. The UI widgets (checkbox,
line edit, browse button) are hidden at runtime when USE_HRTF is not
defined, so the settings file format is stable regardless of build flags.
When USE_HRTF is defined and the user has enabled HRTF in settings,
the positional audio path replaces per-channel gain+ITD with binaural
convolution:

- Listener-local direction computed from the hrtfCamera* vectors saved
  in the preceding structural commit
- Opus stereo downmixed to mono before spatialisation
- calcGain(1.0f, len) provides distance-only attenuation; the HRTF IR
  encodes all directional cues (ILD/ITD)
- Gain ramped linearly across each block (matching the non-HRTF path)
  to avoid clicks on rapid distance changes
- Buffer pointer address used as sourceId, unique for all source types

The existing per-channel gain+ITD block is retained verbatim as the
non-HRTF fallback inside an `#ifdef / } else { / #endif` bracket so
that the lines themselves are unchanged in this diff. Use `git diff -w`
to verify that only new code was added and no logic was modified.
@hiinaspace
Copy link
Copy Markdown
Owner Author

Upstream PR opened mumble-voip#7085 . Leaving this one around for now since 157be04 has the prompt/design doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant