Skip to content

FIX(server): Handle WinAPI objects lifetime properly in Zeroconf#7133

Open
davidebeatrici wants to merge 2 commits intomumble-voip:masterfrom
davidebeatrici:windows-dnssapi-refactor
Open

FIX(server): Handle WinAPI objects lifetime properly in Zeroconf#7133
davidebeatrici wants to merge 2 commits intomumble-voip:masterfrom
davidebeatrici:windows-dnssapi-refactor

Conversation

@davidebeatrici
Copy link
Copy Markdown
Member

Recently we received multiple reports of random server crashes.

Turns out DnsServiceDeRegister(), more specifically svcRegDeregMulticast(), now reads the content of the DNS_SERVICE_INSTANCE object that is indirectly passed to it.
I've confirmed this by comparing dnsapi.dll version 10.0.26100.7920 to version 10.0.10240.16384.

This commit almost completely refactors the Windows-specific code making it more efficient and readable.

Recently we received multiple reports of random server crashes.

Turns out DnsServiceDeRegister(), more specifically svcRegDeregMulticast(), now reads the content of the DNS_SERVICE_INSTANCE object that is indirectly passed to it.
I've confirmed this by comparing dnsapi.dll version 10.0.26100.7920 to version 10.0.10240.16384.

This commit almost completely refactors the Windows-specific code making it more efficient and readable.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

Walkthrough

This change refactors Windows DNS service registration functionality in the Zeroconf class. A new nested Zeroconf::Reg class encapsulates Windows-specific DNSAPI operations, including module loading, function pointer resolution, and callback handling. The registerService() and unregisterService() methods now delegate to this new class on Windows platforms when no helper is available. The destructor behavior is simplified to remove conditional service unregistration. Additionally, isOk() is converted to a constexpr function, and the initialization pattern is updated to check registration status via m_reg.isOk().

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description explains the bug (DnsServiceDeRegister reading DNS_SERVICE_INSTANCE content), provides version evidence, and indicates the refactoring purpose. However, it does not include the required commit guidelines checklist from the template. Add the missing commit guidelines checklist from the repository template to the pull request description.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing WinAPI object lifetime handling in Zeroconf. It is specific, concise, and directly related to the primary refactoring effort described in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/murmur/Zeroconf.cpp`:
- Around line 182-201: Zeroconf::Reg::callback is incorrectly taking ownership
of the DNS_SERVICE_INSTANCE via reg.m_instance.reset(instance) and also
returning early on deregistration without checking status; instead, immediately
release the API-allocated structure with DnsServiceFreeInstance(instance) (do
not use reg.m_instance.reset with the raw pointer unless you wrap it with a
custom deleter that calls DnsServiceFreeInstance), and for the deregistration
path (when reg.m_cancel is null/being reset) check the status and log or handle
non-ERROR_SUCCESS/ERROR_CANCELLED cases rather than returning silently; update
the logic around reg.m_instance and reg.m_cancel so ownership is explicit and
DNS_SERVICE_INSTANCE is freed via DnsServiceFreeInstance in all callback paths.
- Around line 90-130: The destructor Zeroconf::Reg::~Reg currently calls
cancel() which can call DnsServiceDeRegister() and return DNS_REQUEST_PENDING
(async), then immediately FreeLibrary(m_module) and destroy the object while the
async callback (callback) later static_casts the context to Reg* and accesses
m_instance/m_cancel — causing use-after-free; fix by making cancel() synchronous
or by waiting for the async deregister to complete before destroying the object
(e.g. block in ~Reg() until the callback signals completion), or convert Reg to
a reference-counted owner and have the callback hold a weak_ptr that upgrades to
a shared_ptr before accessing members; specifically ensure DnsServiceDeRegister
paths that return DNS_REQUEST_PENDING set a completion event/flag and that
Zeroconf::Reg::~Reg waits on that event (or use DnsServiceRegisterCancel
synchronous path) before calling FreeLibrary(m_module) or resetting
m_instance/m_cancel, and update callback to check/upgrade the weak reference (or
signal the event) instead of directly static_casting the raw context.
- Around line 71-97: The constructor Zeroconf::Reg::Reg currently calls
GetModuleHandle(L"dnsapi.dll") but the destructor uses FreeLibrary (and the
constructor calls FreeModule on error), causing a mismatched lifetime/manage
bug; replace the GetModuleHandle usage with a reference-counting load (either
call GetModuleHandleEx with GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS/0 to
increment the refcount or use LoadLibraryA/LoadLibraryW) and ensure the same
release function (FreeLibrary) is used everywhere (remove or replace the
incorrect FreeModule call), so that m_module is acquired with
GetModuleHandleEx/LoadLibrary and always released with FreeLibrary in both the
error path and Zeroconf::Reg::~Reg; update the checks that use m_module and
symbol resolution (DnsServiceConstructInstance, DnsServiceFreeInstance,
DnsServiceRegister, DnsServiceDeRegister, DnsServiceRegisterCancel) to behave
the same after switching to the ref-counting acquisition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05df230d-4273-465c-8109-65dbbd2332fc

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc9f7b and e7e9ead.

📒 Files selected for processing (2)
  • src/murmur/Zeroconf.cpp
  • src/murmur/Zeroconf.h

Comment thread src/murmur/Zeroconf.cpp
Comment on lines +71 to +97
Zeroconf::Reg::Reg() : m_instance(nullptr, InstanceDeleter{ DnsServiceFreeInstance }) {
m_module = GetModuleHandle(L"dnsapi.dll");
if (!m_module) {
return;
}

GET_SYMBOL(DnsServiceConstructInstance);
GET_SYMBOL(DnsServiceFreeInstance);
GET_SYMBOL(DnsServiceRegister);
GET_SYMBOL(DnsServiceDeRegister);
GET_SYMBOL(DnsServiceRegisterCancel);

if (!DnsServiceConstructInstance || !DnsServiceFreeInstance || !DnsServiceRegister || !DnsServiceDeRegister
|| !DnsServiceRegisterCancel) {
FreeModule(m_module);
m_module = nullptr;
}
}

Zeroconf::Reg::~Reg() {
if (!isOk()) {
return;
}

cancel();
FreeLibrary(m_module);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

GetModuleHandleEx vs GetModuleHandle reference count behavior Windows

💡 Result:

GetModuleHandle retrieves a handle to a loaded module without incrementing its reference count. Do not call FreeLibrary on the returned handle, as this could cause the module to unload prematurely. GetModuleHandleEx retrieves a module handle and increments the reference count unless the GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT flag is specified. When the flag is used, its behavior matches GetModuleHandle (no refcount increment). Without the flag (or with other flags like 0 or GET_MODULE_HANDLE_EX_FLAG_PIN), it increments the refcount, and FreeLibrary must be called to decrement it when done. Key flags for GetModuleHandleEx: - GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT (0x00000002): No refcount change, equivalent to GetModuleHandle. Do not FreeLibrary. - No flag or GET_MODULE_HANDLE_EX_FLAG_PIN (0x00000001): Increments refcount, requires FreeLibrary. - GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS (0x00000004): Treats lpModuleName as address in module; refcount behavior depends on other flags.

Citations:


🏁 Script executed:

cat -n src/murmur/Zeroconf.cpp | head -120

Repository: mumble-voip/mumble

Length of output: 3778


🏁 Script executed:

grep -r "FreeModule" src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 85


🏁 Script executed:

grep -r "#define FreeModule" src/

Repository: mumble-voip/mumble

Length of output: 44


🏁 Script executed:

grep -r "FreeModule" src/murmur/Zeroconf.h

Repository: mumble-voip/mumble

Length of output: 44


🏁 Script executed:

cat src/murmur/Zeroconf.h | grep -A 5 -B 5 "FreeModule"

Repository: mumble-voip/mumble

Length of output: 44


🏁 Script executed:

rg "FreeModule" src/murmur/

Repository: mumble-voip/mumble

Length of output: 109


🏁 Script executed:

# Search for FreeModule in all files
find src/ -type f \( -name "*.h" -o -name "*.cpp" \) -exec grep -l "FreeModule" {} \;

Repository: mumble-voip/mumble

Length of output: 85


🏁 Script executed:

head -20 src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 697


🏁 Script executed:

# Check Windows API documentation or references in code
rg "FreeLibrary|FreeModule" --type cpp

Repository: mumble-voip/mumble

Length of output: 753


🏁 Script executed:

# Check if there's any definition of FreeModule or if it's just a typo
rg "define.*FreeModule|void.*FreeModule"

Repository: mumble-voip/mumble

Length of output: 44


🏁 Script executed:

# Let's see the context around line 85 more carefully
sed -n '83,88p' src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 255


🏁 Script executed:

# Check recent git history for this line to understand if FreeModule was intentional
git log --oneline -n 20 -- src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 135


🏁 Script executed:

git show e7e9ead2 --stat

Repository: mumble-voip/mumble

Length of output: 50374


🏁 Script executed:

git show e7e9ead2

Repository: mumble-voip/mumble

Length of output: 50377


🏁 Script executed:

git show e7e9ead2 -- src/murmur/Zeroconf.cpp | head -200

Repository: mumble-voip/mumble

Length of output: 5752


🏁 Script executed:

# Check if FreeModule is a macro or typo
grep -r "FreeModule" --include="*.h" --include="*.cpp" src/

Repository: mumble-voip/mumble

Length of output: 109


🏁 Script executed:

# Check the current code more carefully around line 85
sed -n '70,100p' src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 762


Incorrect module handle lifetime management: GetModuleHandle paired with FreeLibrary.

GetModuleHandle() at line 72 retrieves a handle to an already-loaded module without incrementing the reference count. However, FreeLibrary() at line 96 decrements the reference count. This mismatch can cause undefined behavior—potentially unloading dnsapi.dll prematurely if its reference count was already 1.

Additionally, line 85 calls FreeModule(), which is not a standard Windows API function. This should likely be FreeLibrary() as well.

Use GetModuleHandleEx with default flags (which increments the ref count) or switch to LoadLibrary/FreeLibrary pairing.

🔧 Proposed fix using GetModuleHandleEx
 Zeroconf::Reg::Reg() : m_instance(nullptr, InstanceDeleter{ DnsServiceFreeInstance }) {
-	m_module = GetModuleHandle(L"dnsapi.dll");
-	if (!m_module) {
+	if (!GetModuleHandleEx(0, L"dnsapi.dll", &m_module)) {
+		m_module = nullptr;
 		return;
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/murmur/Zeroconf.cpp` around lines 71 - 97, The constructor
Zeroconf::Reg::Reg currently calls GetModuleHandle(L"dnsapi.dll") but the
destructor uses FreeLibrary (and the constructor calls FreeModule on error),
causing a mismatched lifetime/manage bug; replace the GetModuleHandle usage with
a reference-counting load (either call GetModuleHandleEx with
GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS/0 to increment the refcount or use
LoadLibraryA/LoadLibraryW) and ensure the same release function (FreeLibrary) is
used everywhere (remove or replace the incorrect FreeModule call), so that
m_module is acquired with GetModuleHandleEx/LoadLibrary and always released with
FreeLibrary in both the error path and Zeroconf::Reg::~Reg; update the checks
that use m_module and symbol resolution (DnsServiceConstructInstance,
DnsServiceFreeInstance, DnsServiceRegister, DnsServiceDeRegister,
DnsServiceRegisterCancel) to behave the same after switching to the ref-counting
acquisition.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadLibrary()/FreeLibrary() is indeed the correct pair to use.

We were already using GetModuleHandle() before but without calling FreeLibrary(), which is guaranteed to work because there is at least one dependency (Qt Network) loading dnsapi.dll.

However, since we're actively using functions that are exported by the library we should indeed increment the reference count to ensure it doesn't get unloaded prematurely.

Comment thread src/murmur/Zeroconf.cpp
Comment on lines +90 to +130
Zeroconf::Reg::~Reg() {
if (!isOk()) {
return;
}

cancel();
FreeLibrary(m_module);
}

bool Zeroconf::Reg::cancel() {
if (!isOk()) {
return false;
}

if (m_cancel) {
const auto ret = DnsServiceRegisterCancel(&m_cancel.value());
if (ret != ERROR_SUCCESS && ret != ERROR_CANCELLED) {
qWarning("Zeroconf: DnsServiceRegisterCancel() failed with error %u!", ret);

m_cancel.reset();
m_instance.reset();

return false;
}
} else if (m_instance) {
DNS_SERVICE_REGISTER_REQUEST req{};
req.Version = DNS_QUERY_REQUEST_VERSION1;
req.pServiceInstance = m_instance.get();
req.pRegisterCompletionCallback = callback;
req.pQueryContext = this;

const auto ret = DnsServiceDeRegister(&req, nullptr);
if (ret != DNS_REQUEST_PENDING) {
qWarning("Zeroconf: DnsServiceDeRegister() failed with error %u!", ret);

return false;
}
}

return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

DnsServiceDeRegister DNS_REQUEST_PENDING callback timing synchronous

💡 Result:

The DnsServiceDeRegister function (from windns.h in Windows 10+ DNS-SD API) is asynchronous. When called successfully, it immediately returns DNS_REQUEST_PENDING (9506), indicating the deregistration request is pending. The callback specified in the original DNS_SERVICE_REGISTER_REQUEST's pRegisterCompletionCallback field (of type DNS_SERVICE_REGISTER_COMPLETE) is invoked asynchronously later, when deregistration completes. The callback receives the status, user context, and a copy of the original DNS_SERVICE_INSTANCE. There is no separate deregister-specific callback type; it reuses the register completion callback. The function call itself is not synchronous—the callback timing is asynchronous, typically after some network/mDNS propagation delay.

Citations:


🏁 Script executed:

# First, let's find the callback function definition referenced at line 128
rg -n "callback\s*\(" src/murmur/Zeroconf.cpp -A 10 -B 2

Repository: mumble-voip/mumble

Length of output: 422


🏁 Script executed:

# Look for the complete Reg class definition to understand the callback and member variables
rg -n "class.*Reg" src/murmur/Zeroconf.cpp -A 50

Repository: mumble-voip/mumble

Length of output: 44


🏁 Script executed:

# Find where the callback function is defined (it's likely a static member)
rg -n "DNS_SERVICE_REGISTER_COMPLETE|void.*callback" src/murmur/Zeroconf.cpp -B 5 -A 20

Repository: mumble-voip/mumble

Length of output: 706


🏁 Script executed:

# Check the Server.cpp:652-662 context mentioned in the review
sed -n '650,665p' src/murmur/Server.cpp

Repository: mumble-voip/mumble

Length of output: 250


🏁 Script executed:

# Look for any event or synchronization mechanisms in the Zeroconf class
rg -n "event|Event|WaitFor|condition|Condition" src/murmur/Zeroconf.cpp

Repository: mumble-voip/mumble

Length of output: 44


Use-after-free in async deregistration callback.

When cancel() calls DnsServiceDeRegister() at line 121 and receives DNS_REQUEST_PENDING, the operation completes asynchronously. However, ~Reg() immediately proceeds to FreeLibrary(m_module) and the object is destroyed. The callback at line 182 will later fire on the destroyed object, as it unsafely dereferences the context pointer with a bare static_cast< Reg * >(context) and accesses m_instance and m_cancel without any lifetime protection.

Server::removeZeroconf() (line 657) calls delete zeroconf without waiting for pending async operations.

Either wait for the callback to complete before destruction (using an event or condition variable), use synchronous deregistration, or ensure callbacks are protected via shared/weak pointers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/murmur/Zeroconf.cpp` around lines 90 - 130, The destructor
Zeroconf::Reg::~Reg currently calls cancel() which can call
DnsServiceDeRegister() and return DNS_REQUEST_PENDING (async), then immediately
FreeLibrary(m_module) and destroy the object while the async callback (callback)
later static_casts the context to Reg* and accesses m_instance/m_cancel —
causing use-after-free; fix by making cancel() synchronous or by waiting for the
async deregister to complete before destroying the object (e.g. block in ~Reg()
until the callback signals completion), or convert Reg to a reference-counted
owner and have the callback hold a weak_ptr that upgrades to a shared_ptr before
accessing members; specifically ensure DnsServiceDeRegister paths that return
DNS_REQUEST_PENDING set a completion event/flag and that Zeroconf::Reg::~Reg
waits on that event (or use DnsServiceRegisterCancel synchronous path) before
calling FreeLibrary(m_module) or resetting m_instance/m_cancel, and update
callback to check/upgrade the weak reference (or signal the event) instead of
directly static_casting the raw context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a problem, smart pointers are the way to go to fix this.

Comment thread src/murmur/Zeroconf.cpp
Comment on lines +182 to 201
void WINAPI Zeroconf::Reg::callback(const DWORD status, void *context, DNS_SERVICE_INSTANCE *instance) {
auto &reg = *static_cast< Reg * >(context);

if (instance) {
zeroconf->DnsServiceFreeInstance(instance);
}
reg.m_instance.reset(instance);

if (status == ERROR_CANCELLED) {
if (!reg.m_cancel) {
// No cancel handle, which means this is a de-registration.
return;
}

zeroconf->m_cancel.reset();
reg.m_cancel.reset();

if (status != ERROR_SUCCESS) {
qWarning("Zeroconf: DnsServiceRegister() reports status code %u, service registration probably failed", status);
switch (status) {
case ERROR_SUCCESS:
case ERROR_CANCELLED:
break;
default:
qWarning("Zeroconf: DnsServiceRegister() reports status code %u, service registration probably failed",
status);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

DnsServiceRegister callback pInstance parameter same pointer deregistration

💡 Result:

In the Windows DnsServiceRegister API, the pInstance parameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a new allocated structure describing the registered service. It is not the same pointer as the pServiceInstance provided in the original DNS_SERVICE_REGISTER_REQUEST. The caller must free this callback pInstance using DnsServiceFreeInstance if it is not nullptr. For deregistration, use DnsServiceDeRegister with the original DNS_SERVICE_REGISTER_REQUEST structure (pRequest). The deregistration callback receives a copy of the original DNS_SERVICE_INSTANCE (from registration request), not the callback pInstance. No sources indicate the callback pInstance pointer is reused or relevant for deregistration. The same pointer from input is not returned in the callback; a new one is allocated.

Citations:


Improper Windows API memory management for callback pInstance; deregistration errors silently ignored.

Two concerns with this callback:

  1. Line 185: According to Windows API documentation, the pInstance parameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a newly allocated structure that must be freed using DnsServiceFreeInstance(). The code calls reg.m_instance.reset(instance), which assumes the unique_ptr deleter matches the Windows API allocation/deallocation contract—it does not. This will cause a memory leak or heap corruption. The callback must explicitly call DnsServiceFreeInstance(instance) to properly release this allocation.

  2. Lines 187-189: For deregistration callbacks, the function returns early without checking status. Failed deregistrations are silently ignored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/murmur/Zeroconf.cpp` around lines 182 - 201, Zeroconf::Reg::callback is
incorrectly taking ownership of the DNS_SERVICE_INSTANCE via
reg.m_instance.reset(instance) and also returning early on deregistration
without checking status; instead, immediately release the API-allocated
structure with DnsServiceFreeInstance(instance) (do not use reg.m_instance.reset
with the raw pointer unless you wrap it with a custom deleter that calls
DnsServiceFreeInstance), and for the deregistration path (when reg.m_cancel is
null/being reset) check the status and log or handle
non-ERROR_SUCCESS/ERROR_CANCELLED cases rather than returning silently; update
the logic around reg.m_instance and reg.m_cancel so ownership is explicit and
DNS_SERVICE_INSTANCE is freed via DnsServiceFreeInstance in all callback paths.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Line 185: According to Windows API documentation, the pInstance parameter passed to the DNS_SERVICE_REGISTER_COMPLETE callback is a newly allocated structure that must be freed using DnsServiceFreeInstance(). The code calls reg.m_instance.reset(instance), which assumes the unique_ptr deleter matches the Windows API allocation/deallocation contract—it does not. This will cause a memory leak or heap corruption. The callback must explicitly call DnsServiceFreeInstance(instance) to properly release this allocation.

The smart pointer has a custom deleter defined that takes care of calling DnsServiceFreeInstance().

  1. Lines 187-189: For deregistration callbacks, the function returns early without checking status. Failed deregistrations are silently ignored.

This is intended and matches the previous behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server: Crash when calling DnsServiceDeRegister() on recent Windows versions

2 participants