FIX(server): Handle WinAPI objects lifetime properly in Zeroconf#7133
FIX(server): Handle WinAPI objects lifetime properly in Zeroconf#7133davidebeatrici wants to merge 2 commits intomumble-voip:masterfrom
Conversation
Recently we received multiple reports of random server crashes. Turns out DnsServiceDeRegister(), more specifically svcRegDeregMulticast(), now reads the content of the DNS_SERVICE_INSTANCE object that is indirectly passed to it. I've confirmed this by comparing dnsapi.dll version 10.0.26100.7920 to version 10.0.10240.16384. This commit almost completely refactors the Windows-specific code making it more efficient and readable.
WalkthroughThis change refactors Windows DNS service registration functionality in the Zeroconf class. A new nested 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Migrating from UI to YAML configuration.Use the |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/murmur/Zeroconf.cpp`:
- Around line 182-201: Zeroconf::Reg::callback is incorrectly taking ownership
of the DNS_SERVICE_INSTANCE via reg.m_instance.reset(instance) and also
returning early on deregistration without checking status; instead, immediately
release the API-allocated structure with DnsServiceFreeInstance(instance) (do
not use reg.m_instance.reset with the raw pointer unless you wrap it with a
custom deleter that calls DnsServiceFreeInstance), and for the deregistration
path (when reg.m_cancel is null/being reset) check the status and log or handle
non-ERROR_SUCCESS/ERROR_CANCELLED cases rather than returning silently; update
the logic around reg.m_instance and reg.m_cancel so ownership is explicit and
DNS_SERVICE_INSTANCE is freed via DnsServiceFreeInstance in all callback paths.
- Around line 90-130: The destructor Zeroconf::Reg::~Reg currently calls
cancel() which can call DnsServiceDeRegister() and return DNS_REQUEST_PENDING
(async), then immediately FreeLibrary(m_module) and destroy the object while the
async callback (callback) later static_casts the context to Reg* and accesses
m_instance/m_cancel — causing use-after-free; fix by making cancel() synchronous
or by waiting for the async deregister to complete before destroying the object
(e.g. block in ~Reg() until the callback signals completion), or convert Reg to
a reference-counted owner and have the callback hold a weak_ptr that upgrades to
a shared_ptr before accessing members; specifically ensure DnsServiceDeRegister
paths that return DNS_REQUEST_PENDING set a completion event/flag and that
Zeroconf::Reg::~Reg waits on that event (or use DnsServiceRegisterCancel
synchronous path) before calling FreeLibrary(m_module) or resetting
m_instance/m_cancel, and update callback to check/upgrade the weak reference (or
signal the event) instead of directly static_casting the raw context.
- Around line 71-97: The constructor Zeroconf::Reg::Reg currently calls
GetModuleHandle(L"dnsapi.dll") but the destructor uses FreeLibrary (and the
constructor calls FreeModule on error), causing a mismatched lifetime/manage
bug; replace the GetModuleHandle usage with a reference-counting load (either
call GetModuleHandleEx with GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS/0 to
increment the refcount or use LoadLibraryA/LoadLibraryW) and ensure the same
release function (FreeLibrary) is used everywhere (remove or replace the
incorrect FreeModule call), so that m_module is acquired with
GetModuleHandleEx/LoadLibrary and always released with FreeLibrary in both the
error path and Zeroconf::Reg::~Reg; update the checks that use m_module and
symbol resolution (DnsServiceConstructInstance, DnsServiceFreeInstance,
DnsServiceRegister, DnsServiceDeRegister, DnsServiceRegisterCancel) to behave
the same after switching to the ref-counting acquisition.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 05df230d-4273-465c-8109-65dbbd2332fc
📒 Files selected for processing (2)
src/murmur/Zeroconf.cppsrc/murmur/Zeroconf.h
| Zeroconf::Reg::Reg() : m_instance(nullptr, InstanceDeleter{ DnsServiceFreeInstance }) { | ||
| m_module = GetModuleHandle(L"dnsapi.dll"); | ||
| if (!m_module) { | ||
| return; | ||
| } | ||
|
|
||
| GET_SYMBOL(DnsServiceConstructInstance); | ||
| GET_SYMBOL(DnsServiceFreeInstance); | ||
| GET_SYMBOL(DnsServiceRegister); | ||
| GET_SYMBOL(DnsServiceDeRegister); | ||
| GET_SYMBOL(DnsServiceRegisterCancel); | ||
|
|
||
| if (!DnsServiceConstructInstance || !DnsServiceFreeInstance || !DnsServiceRegister || !DnsServiceDeRegister | ||
| || !DnsServiceRegisterCancel) { | ||
| FreeModule(m_module); | ||
| m_module = nullptr; | ||
| } | ||
| } | ||
|
|
||
| Zeroconf::Reg::~Reg() { | ||
| if (!isOk()) { | ||
| return; | ||
| } | ||
|
|
||
| cancel(); | ||
| FreeLibrary(m_module); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GetModuleHandleEx vs GetModuleHandle reference count behavior Windows
💡 Result:
GetModuleHandle retrieves a handle to a loaded module without incrementing its reference count. Do not call FreeLibrary on the returned handle, as this could cause the module to unload prematurely. GetModuleHandleEx retrieves a module handle and increments the reference count unless the GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT flag is specified. When the flag is used, its behavior matches GetModuleHandle (no refcount increment). Without the flag (or with other flags like 0 or GET_MODULE_HANDLE_EX_FLAG_PIN), it increments the refcount, and FreeLibrary must be called to decrement it when done. Key flags for GetModuleHandleEx: - GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT (0x00000002): No refcount change, equivalent to GetModuleHandle. Do not FreeLibrary. - No flag or GET_MODULE_HANDLE_EX_FLAG_PIN (0x00000001): Increments refcount, requires FreeLibrary. - GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS (0x00000004): Treats lpModuleName as address in module; refcount behavior depends on other flags.
Citations:
- 1: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-getmodulehandleexa
- 2: https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-getmodulehandleexw
- 3: https://stackoverflow.com/questions/38056252/do-i-need-to-close-the-handle-if-i-dont-store-the-return-value-of-getmodulehand
- 4: https://stackoverflow.com/questions/2148935/does-getmodulehandle-function-cause-any-leaks-stack-overflow-and-memory-leaks
🏁 Script executed:
cat -n src/murmur/Zeroconf.cpp | head -120Repository: mumble-voip/mumble
Length of output: 3778
🏁 Script executed:
grep -r "FreeModule" src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 85
🏁 Script executed:
grep -r "#define FreeModule" src/Repository: mumble-voip/mumble
Length of output: 44
🏁 Script executed:
grep -r "FreeModule" src/murmur/Zeroconf.hRepository: mumble-voip/mumble
Length of output: 44
🏁 Script executed:
cat src/murmur/Zeroconf.h | grep -A 5 -B 5 "FreeModule"Repository: mumble-voip/mumble
Length of output: 44
🏁 Script executed:
rg "FreeModule" src/murmur/Repository: mumble-voip/mumble
Length of output: 109
🏁 Script executed:
# Search for FreeModule in all files
find src/ -type f \( -name "*.h" -o -name "*.cpp" \) -exec grep -l "FreeModule" {} \;Repository: mumble-voip/mumble
Length of output: 85
🏁 Script executed:
head -20 src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 697
🏁 Script executed:
# Check Windows API documentation or references in code
rg "FreeLibrary|FreeModule" --type cppRepository: mumble-voip/mumble
Length of output: 753
🏁 Script executed:
# Check if there's any definition of FreeModule or if it's just a typo
rg "define.*FreeModule|void.*FreeModule"Repository: mumble-voip/mumble
Length of output: 44
🏁 Script executed:
# Let's see the context around line 85 more carefully
sed -n '83,88p' src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 255
🏁 Script executed:
# Check recent git history for this line to understand if FreeModule was intentional
git log --oneline -n 20 -- src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 135
🏁 Script executed:
git show e7e9ead2 --statRepository: mumble-voip/mumble
Length of output: 50374
🏁 Script executed:
git show e7e9ead2Repository: mumble-voip/mumble
Length of output: 50377
🏁 Script executed:
git show e7e9ead2 -- src/murmur/Zeroconf.cpp | head -200Repository: mumble-voip/mumble
Length of output: 5752
🏁 Script executed:
# Check if FreeModule is a macro or typo
grep -r "FreeModule" --include="*.h" --include="*.cpp" src/Repository: mumble-voip/mumble
Length of output: 109
🏁 Script executed:
# Check the current code more carefully around line 85
sed -n '70,100p' src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 762
Incorrect module handle lifetime management: GetModuleHandle paired with FreeLibrary.
GetModuleHandle() at line 72 retrieves a handle to an already-loaded module without incrementing the reference count. However, FreeLibrary() at line 96 decrements the reference count. This mismatch can cause undefined behavior—potentially unloading dnsapi.dll prematurely if its reference count was already 1.
Additionally, line 85 calls FreeModule(), which is not a standard Windows API function. This should likely be FreeLibrary() as well.
Use GetModuleHandleEx with default flags (which increments the ref count) or switch to LoadLibrary/FreeLibrary pairing.
🔧 Proposed fix using GetModuleHandleEx
Zeroconf::Reg::Reg() : m_instance(nullptr, InstanceDeleter{ DnsServiceFreeInstance }) {
- m_module = GetModuleHandle(L"dnsapi.dll");
- if (!m_module) {
+ if (!GetModuleHandleEx(0, L"dnsapi.dll", &m_module)) {
+ m_module = nullptr;
return;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/murmur/Zeroconf.cpp` around lines 71 - 97, The constructor
Zeroconf::Reg::Reg currently calls GetModuleHandle(L"dnsapi.dll") but the
destructor uses FreeLibrary (and the constructor calls FreeModule on error),
causing a mismatched lifetime/manage bug; replace the GetModuleHandle usage with
a reference-counting load (either call GetModuleHandleEx with
GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS/0 to increment the refcount or use
LoadLibraryA/LoadLibraryW) and ensure the same release function (FreeLibrary) is
used everywhere (remove or replace the incorrect FreeModule call), so that
m_module is acquired with GetModuleHandleEx/LoadLibrary and always released with
FreeLibrary in both the error path and Zeroconf::Reg::~Reg; update the checks
that use m_module and symbol resolution (DnsServiceConstructInstance,
DnsServiceFreeInstance, DnsServiceRegister, DnsServiceDeRegister,
DnsServiceRegisterCancel) to behave the same after switching to the ref-counting
acquisition.
There was a problem hiding this comment.
LoadLibrary()/FreeLibrary() is indeed the correct pair to use.
We were already using GetModuleHandle() before but without calling FreeLibrary(), which is guaranteed to work because there is at least one dependency (Qt Network) loading dnsapi.dll.
However, since we're actively using functions that are exported by the library we should indeed increment the reference count to ensure it doesn't get unloaded prematurely.
| Zeroconf::Reg::~Reg() { | ||
| if (!isOk()) { | ||
| return; | ||
| } | ||
|
|
||
| cancel(); | ||
| FreeLibrary(m_module); | ||
| } | ||
|
|
||
| bool Zeroconf::Reg::cancel() { | ||
| if (!isOk()) { | ||
| return false; | ||
| } | ||
|
|
||
| if (m_cancel) { | ||
| const auto ret = DnsServiceRegisterCancel(&m_cancel.value()); | ||
| if (ret != ERROR_SUCCESS && ret != ERROR_CANCELLED) { | ||
| qWarning("Zeroconf: DnsServiceRegisterCancel() failed with error %u!", ret); | ||
|
|
||
| m_cancel.reset(); | ||
| m_instance.reset(); | ||
|
|
||
| return false; | ||
| } | ||
| } else if (m_instance) { | ||
| DNS_SERVICE_REGISTER_REQUEST req{}; | ||
| req.Version = DNS_QUERY_REQUEST_VERSION1; | ||
| req.pServiceInstance = m_instance.get(); | ||
| req.pRegisterCompletionCallback = callback; | ||
| req.pQueryContext = this; | ||
|
|
||
| const auto ret = DnsServiceDeRegister(&req, nullptr); | ||
| if (ret != DNS_REQUEST_PENDING) { | ||
| qWarning("Zeroconf: DnsServiceDeRegister() failed with error %u!", ret); | ||
|
|
||
| return false; | ||
| } | ||
| } | ||
|
|
||
| return true; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
DnsServiceDeRegister DNS_REQUEST_PENDING callback timing synchronous
💡 Result:
The DnsServiceDeRegister function (from windns.h in Windows 10+ DNS-SD API) is asynchronous. When called successfully, it immediately returns DNS_REQUEST_PENDING (9506), indicating the deregistration request is pending. The callback specified in the original DNS_SERVICE_REGISTER_REQUEST's pRegisterCompletionCallback field (of type DNS_SERVICE_REGISTER_COMPLETE) is invoked asynchronously later, when deregistration completes. The callback receives the status, user context, and a copy of the original DNS_SERVICE_INSTANCE. There is no separate deregister-specific callback type; it reuses the register completion callback. The function call itself is not synchronous—the callback timing is asynchronous, typically after some network/mDNS propagation delay.
Citations:
- 1: https://learn.microsoft.com/en-us/windows/win32/api/windns/nf-windns-dnsservicederegister
- 2: https://learn.microsoft.com/en-us/windows/win32/api/windns/nf-windns-dnsserviceregister
- 3: https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--9000-11999-
- 4: https://learn.microsoft.com/en-us/windows/win32/api/windns/ns-windns-dns_service_register_request
🏁 Script executed:
# First, let's find the callback function definition referenced at line 128
rg -n "callback\s*\(" src/murmur/Zeroconf.cpp -A 10 -B 2Repository: mumble-voip/mumble
Length of output: 422
🏁 Script executed:
# Look for the complete Reg class definition to understand the callback and member variables
rg -n "class.*Reg" src/murmur/Zeroconf.cpp -A 50Repository: mumble-voip/mumble
Length of output: 44
🏁 Script executed:
# Find where the callback function is defined (it's likely a static member)
rg -n "DNS_SERVICE_REGISTER_COMPLETE|void.*callback" src/murmur/Zeroconf.cpp -B 5 -A 20Repository: mumble-voip/mumble
Length of output: 706
🏁 Script executed:
# Check the Server.cpp:652-662 context mentioned in the review
sed -n '650,665p' src/murmur/Server.cppRepository: mumble-voip/mumble
Length of output: 250
🏁 Script executed:
# Look for any event or synchronization mechanisms in the Zeroconf class
rg -n "event|Event|WaitFor|condition|Condition" src/murmur/Zeroconf.cppRepository: mumble-voip/mumble
Length of output: 44
Use-after-free in async deregistration callback.
When cancel() calls DnsServiceDeRegister() at line 121 and receives DNS_REQUEST_PENDING, the operation completes asynchronously. However, ~Reg() immediately proceeds to FreeLibrary(m_module) and the object is destroyed. The callback at line 182 will later fire on the destroyed object, as it unsafely dereferences the context pointer with a bare static_cast< Reg * >(context) and accesses m_instance and m_cancel without any lifetime protection.
Server::removeZeroconf() (line 657) calls delete zeroconf without waiting for pending async operations.
Either wait for the callback to complete before destruction (using an event or condition variable), use synchronous deregistration, or ensure callbacks are protected via shared/weak pointers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/murmur/Zeroconf.cpp` around lines 90 - 130, The destructor
Zeroconf::Reg::~Reg currently calls cancel() which can call
DnsServiceDeRegister() and return DNS_REQUEST_PENDING (async), then immediately
FreeLibrary(m_module) and destroy the object while the async callback (callback)
later static_casts the context to Reg* and accesses m_instance/m_cancel —
causing use-after-free; fix by making cancel() synchronous or by waiting for the
async deregister to complete before destroying the object (e.g. block in ~Reg()
until the callback signals completion), or convert Reg to a reference-counted
owner and have the callback hold a weak_ptr that upgrades to a shared_ptr before
accessing members; specifically ensure DnsServiceDeRegister paths that return
DNS_REQUEST_PENDING set a completion event/flag and that Zeroconf::Reg::~Reg
waits on that event (or use DnsServiceRegisterCancel synchronous path) before
calling FreeLibrary(m_module) or resetting m_instance/m_cancel, and update
callback to check/upgrade the weak reference (or signal the event) instead of
directly static_casting the raw context.
There was a problem hiding this comment.
This is indeed a problem, smart pointers are the way to go to fix this.
| void WINAPI Zeroconf::Reg::callback(const DWORD status, void *context, DNS_SERVICE_INSTANCE *instance) { | ||
| auto ® = *static_cast< Reg * >(context); | ||
|
|
||
| if (instance) { | ||
| zeroconf->DnsServiceFreeInstance(instance); | ||
| } | ||
| reg.m_instance.reset(instance); | ||
|
|
||
| if (status == ERROR_CANCELLED) { | ||
| if (!reg.m_cancel) { | ||
| // No cancel handle, which means this is a de-registration. | ||
| return; | ||
| } | ||
|
|
||
| zeroconf->m_cancel.reset(); | ||
| reg.m_cancel.reset(); | ||
|
|
||
| if (status != ERROR_SUCCESS) { | ||
| qWarning("Zeroconf: DnsServiceRegister() reports status code %u, service registration probably failed", status); | ||
| switch (status) { | ||
| case ERROR_SUCCESS: | ||
| case ERROR_CANCELLED: | ||
| break; | ||
| default: | ||
| qWarning("Zeroconf: DnsServiceRegister() reports status code %u, service registration probably failed", | ||
| status); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
DnsServiceRegister callback pInstance parameter same pointer deregistration
💡 Result:
In the Windows DnsServiceRegister API, the pInstance parameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a new allocated structure describing the registered service. It is not the same pointer as the pServiceInstance provided in the original DNS_SERVICE_REGISTER_REQUEST. The caller must free this callback pInstance using DnsServiceFreeInstance if it is not nullptr. For deregistration, use DnsServiceDeRegister with the original DNS_SERVICE_REGISTER_REQUEST structure (pRequest). The deregistration callback receives a copy of the original DNS_SERVICE_INSTANCE (from registration request), not the callback pInstance. No sources indicate the callback pInstance pointer is reused or relevant for deregistration. The same pointer from input is not returned in the callback; a new one is allocated.
Citations:
- 1: https://learn.microsoft.com/en-us/windows/win32/api/windns/nc-windns-dns_service_register_complete
- 2: https://learn.microsoft.com/en-us/windows/win32/api/windns/nf-windns-dnsserviceregister
- 3: https://learn.microsoft.com/en-us/windows/win32/api/windns/ns-windns-dns_service_register_request
- 4: https://learn.microsoft.com/en-us/windows/win32/api/windns/nf-windns-dnsservicefreeinstance
- 5: https://learn.microsoft.com/en-us/windows/win32/api/windns/nf-windns-dnsservicederegister
Improper Windows API memory management for callback pInstance; deregistration errors silently ignored.
Two concerns with this callback:
-
Line 185: According to Windows API documentation, the
pInstanceparameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a newly allocated structure that must be freed usingDnsServiceFreeInstance(). The code callsreg.m_instance.reset(instance), which assumes theunique_ptrdeleter matches the Windows API allocation/deallocation contract—it does not. This will cause a memory leak or heap corruption. The callback must explicitly callDnsServiceFreeInstance(instance)to properly release this allocation. -
Lines 187-189: For deregistration callbacks, the function returns early without checking
status. Failed deregistrations are silently ignored.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/murmur/Zeroconf.cpp` around lines 182 - 201, Zeroconf::Reg::callback is
incorrectly taking ownership of the DNS_SERVICE_INSTANCE via
reg.m_instance.reset(instance) and also returning early on deregistration
without checking status; instead, immediately release the API-allocated
structure with DnsServiceFreeInstance(instance) (do not use reg.m_instance.reset
with the raw pointer unless you wrap it with a custom deleter that calls
DnsServiceFreeInstance), and for the deregistration path (when reg.m_cancel is
null/being reset) check the status and log or handle
non-ERROR_SUCCESS/ERROR_CANCELLED cases rather than returning silently; update
the logic around reg.m_instance and reg.m_cancel so ownership is explicit and
DNS_SERVICE_INSTANCE is freed via DnsServiceFreeInstance in all callback paths.
There was a problem hiding this comment.
- Line 185: According to Windows API documentation, the
pInstanceparameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a newly allocated structure that must be freed usingDnsServiceFreeInstance(). The code callsreg.m_instance.reset(instance), which assumes theunique_ptrdeleter matches the Windows API allocation/deallocation contract—it does not. This will cause a memory leak or heap corruption. The callback must explicitly callDnsServiceFreeInstance(instance)to properly release this allocation.
The smart pointer has a custom deleter defined that takes care of calling DnsServiceFreeInstance().
- Lines 187-189: For deregistration callbacks, the function returns early without checking
status. Failed deregistrations are silently ignored.
This is intended and matches the previous behavior.
Recently we received multiple reports of random server crashes.
Turns out
DnsServiceDeRegister(), more specificallysvcRegDeregMulticast(), now reads the content of theDNS_SERVICE_INSTANCEobject that is indirectly passed to it.I've confirmed this by comparing
dnsapi.dllversion10.0.26100.7920to version10.0.10240.16384.This commit almost completely refactors the Windows-specific code making it more efficient and readable.