Why did dlclose not unload the library? (2023)

1. sgbeal ◴[30 Aug 25 13:34 UTC] No.45074556[source]▶

There is no safe, reliable, cross-environment way to deal with closing a DLL. A DLL initialization function can allocate arbitrary resources, some of which may be in use by clients of the DLL when it is closed.

The only safe, consistent, reliable approach is not to close DLLs.

replies(6): >>45074649 #>>45074720 #>>45075478 #>>45075558 #>>45075726 #>>45080214 #

2. 10000truths ◴[30 Aug 25 13:43 UTC] No.45074649[source]▶

>>45074556 (TP) #

You can run the DLL in a "shim" subprocess that proxies function calls over IPC. Then the DLL can muck about with global state all it wants, and the OS will clean up after it when you "unload" the DLL by killing the subprocess.

replies(1): >>45074914 #

3. AndrewStephens ◴[30 Aug 25 13:54 UTC] No.45074720[source]▶

>>45074556 (TP) #

This is the way.

As this article details, there are so many circumstances that preclude DLLs being unloaded completely that I was surprised that their design actually worked at all. So many language constructs do not play nicely with the idea that code and static data can just disappear at runtime.

4. sgbeal ◴[30 Aug 25 14:21 UTC] No.45074914[source]▶

>>45074649 #

> You can run the DLL in a "shim" subprocess that proxies function calls over IPC.

That doesn't address the need of some DLLs to malloc() resources in the context of the applications linking to them.

This problem _cannot_ be solved _generically_. Any solutions are extremely API-specific and impose restrictions on their users (the linking applications) which, if violated, will lead to Undefined Behavior.

Edit: as an example of cases which must bind resources in the application's context: see the Classloading in C++ paper at <https://wanderinghorse.net/computing/papers/index.html#class...> (disclosure: i wrote that article).

replies(2): >>45075196 #>>45075239 #

5. 10000truths ◴[30 Aug 25 14:55 UTC] No.45075196{3}[source]▶

>>45074914 #

True, it's application specific. The typical reason an application would need to load/unload DLLs cleanly and repeatedly (where the caveats of dlclose are in full force) is for a plugin system with hot-reload functionality. And for that use case, the expected API/ABI is known in advance, so the shim can be tailored for it.

6. LegionMammal978 ◴[30 Aug 25 15:00 UTC] No.45075239{3}[source]▶

>>45074914 #

"Extremely" API-specific is a bit much. One way to do it would be to guard all DLL-implemented functions, and all access to DLL-associated resources, with a reference-counted token, or otherwise use a language-level mechanism (such as Rust's lifetimes) to ensure that all resources are dead by the time of closure. This would take some tedious wrapper-writing, but it's not so complex that it couldn't be done by a source generator or whatever.

Of course, C/C++ applications written in the traditional model with static data everwhere would have difficulty not leaking tokens and holding the DLL open, but it's still far from impossible to write such a safe API.

> That doesn't address the need of some DLLs to malloc() resources in the context of the applications linking to them.

If there is a context boundary and really such a need, then the DLL can keep a list of all such resources, and destroy all those resources once closed. Access to them would similarly have to be protected by a token.

replies(1): >>45075607 #

7. dataflow ◴[30 Aug 25 15:29 UTC] No.45075478[source]▶

>>45074556 (TP) #

I feel like I wouldn't frame it like this. Rather, the underlying assumption of the problem has to be that the DLL's resources are already released, otherwise the problem of what happens when the resources are used afterward is itself ill-posed. The problem is really how to ensure that.

8. immibis ◴[30 Aug 25 15:39 UTC] No.45075558[source]▶

>>45074556 (TP) #

There is no safe, reliable, cross-environment way to deal with deallocating memory. A memory block can be referenced from arbitrary locations, some of which may be on the stack by clients of the memory block when it isndeallocated.

The only safe, consistent, reliable approach is not to deallocate memory.

replies(3): >>45075570 #>>45076060 #>>45077479 #

9. zbentley ◴[30 Aug 25 15:40 UTC] No.45075570[source]▶

>>45075558 #

You joke, but there are programs that do exactly this.

10. zbentley ◴[30 Aug 25 15:45 UTC] No.45075607{4}[source]▶

>>45075239 #

> One way to do it would be to guard all DLL-implemented functions, and all access to DLL-associated resources, with a reference-counted token, or otherwise use a language-level mechanism (such as Rust's lifetimes) to ensure that all resources are dead by the time of closure.

That's true, but those approaches are only viable if you trust the DLL in question. External libraries are fundamentally opaque/could contain anything, and if you're in a tinfoil-hat mood, it's quite easy to make new libraries that emulate the ABI of the intended library but do different (maybe malicious, maybe just LD_PRELOAD tricksy) things.

Consider: an evil wrapper library could put the thinnest possible shim around the "real" version of the library and just not properly account for resources, exposing library (un)loaders to use-after-free without much work, even if the library loaders relied upon the approaches proposed.

Since there aren't good cross-platform and race-condition-free ways of saying "authenticate this external library via checksum/codesigning, then load it", there are some situations where the proposed approaches aren't good enough.

Sure, most situations probably don't need that paranoia level (or control the code/provenance of the target library implicitly). But the number of situations where that security risk does come up is larger than you'd think, especially given automatic look-up-library-by-name-via-LD_LIBRARY_PATH-ish behavior.

replies(2): >>45076007 #>>45076023 #

11. justincormack ◴[30 Aug 25 16:02 UTC] No.45075726[source]▶

>>45074556 (TP) #

thats why Musl libc has dlclose as a no-op [0]

[0] https://wiki.musl-libc.org/functional-differences-from-glibc...

12. LegionMammal978 ◴[30 Aug 25 16:43 UTC] No.45076007{5}[source]▶

>>45075607 #

If you load malicious code into your address space and execute it, then it can always do malicious things to your data. If you load malicious code into a separate process and execute it, then it can almost certainly do malicious things to your data, unless you put it into a locked-down user context and trust your OS and environment not to have any local privilege escalations (lol). The only real way to load untrusted native code is to put it in an OS-level container and communicate via IPC, or better yet, put it in a VM and communicate via a virtual network.

The measures I suggested before were all in the context of buggy users that can't resist the urge to keep references to the library's resources lying around all over the place. But untrusted code can never be made safe with anything short of a strong sandbox.

13. johnisgood ◴[30 Aug 25 16:45 UTC] No.45076023{5}[source]▶

>>45075607 #

> Since there aren't good cross-platform and race-condition-free ways of saying "authenticate this external library via checksum/codesigning, then load it", there are some situations where the proposed approaches aren't good enough.

Sign your libraries with Ed25519 and embed the public key in your app, verify before load. How is this not cross-platform enough?

Of course you still introduce a TOCTOU (time of check, time of use) race condition, which is why oftentimes you want to first check, load, then check again.

A common solution, however, is opening the library file once, then verify checksum/signature against trusted key, and if valid, create a private, unlinked temporary file (O_TMPFILE on Linux), write the verified contents into this temporary file, rewind and dlopen() (or LoadLibrary()) this temporary copy. Because the file is unlinked after creation (or opened with O_TMPFILE), no one else can swap it out, and you eliminate TOCTOU this way because you only ever read and load the exact bytes you verified. This is how container runtimes and some plugin systems avoid races. BTW on Linux you can use memfd_create() which creates an anonymous, in-memory file descriptor. You can do the same on Windows and macOS. Then you can verify the library's signature / hash, copy verified contents into a memfd (Linux) or FileMapping (Windows), and then load directly from that memory-backed handle.

TL;DR: never load from a mutable path after verification. Verifying untrusted binary bytes into a sealed memfd, for example, is race-safe.

FWIW, for applications I use firejail (not bubblewrap) for all applications such as my browser, Discord, LibreOffice, mupdf, etc. I recommend everyone to do the same. No way in hell I will give my browser access to files it does not need access to. It only has access to what it needs (related to pulseaudio, Downloads directory, etc), and say, no way I will give Discord access to my downloaded files (or my browser history) or anything really, apart from a directory where I put files I want to send.

replies(1): >>45076400 #

14. SkiFire13 ◴[30 Aug 25 16:50 UTC] No.45076060[source]▶

>>45075558 #

The difference is that memory won't do anything under your nose, it can't run arbitrary code by itself. It won't spawn threads, create thread locals, or store data in global variables. And it's normal to track the lifetime of memory, much less the lifetime of code and function pointers passed around.

replies(2): >>45077149 #>>45078799 #

15. ◴[30 Aug 25 17:24 UTC] No.45076400{6}[source]▶

>>45076023 #

16. immibis ◴[30 Aug 25 19:03 UTC] No.45077149{3}[source]▶

>>45076060 #

The same is true of DLLs. They don't do anything by themselves; they are merely blocks of bytes mapped into memory.

Why is tracking the lifetime of a function pointer different from tracking the lifetime of any other pointer?

FreeLibrary on Windows unloads libraries when the reference count is zero.

17. themafia ◴[30 Aug 25 19:47 UTC] No.45077479[source]▶

>>45075558 #

It seems like the major problem here was blindly calling an init function and then completely failing when it returned a non standard result.

The init function could return a status that indicates "already initialized."

The calling code could observe this and passively warn of the unusual condition but otherwise proceed.

18. AstralStorm ◴[30 Aug 25 23:07 UTC] No.45078799{3}[source]▶

>>45076060 #

You could do that, but then you're running valgrind.

19. kazinator ◴[31 Aug 25 03:48 UTC] No.45080214[source]▶

>>45074556 (TP) #

The DLL consists of code and static data. If nothing needs the code or the static data, it can be blown away. If that library allocated something which it is responsible for freeing, but that something still exists even though that library's very code is no longer referenced (nobody will call into it), it almost certainly means there is a memory leak.

A memory leak caused by a library isn't necessarily something that should prevent it from being unloaded. Unless it is the leaked objects that are holding a reference! (Then we need a full blown GC system to detect cycles.)