←back to thread

196 points generichuman | 1 comments | | HN request time: 0.001s | source
Show context
BwackNinja ◴[] No.43552804[source]
There is no distinction between system and program libraries in Linux. We used to pretend there was one before usrmigration, but that was never good to take seriously.

The distro as packager model ensures that everything is mixed together in the filesystem and is actively hostile to external packaging. Vendoring dependencies or static linking improves compatibility by choosing known working versions, but decreases incentive and ability for downstream (or users) to upgrade those dependencies.

The libc stuff in this article is mostly glibc-specific, and you'd have fewer issues targeting musl. Mixing static linking and dlopen doesn't make much sense, as said here[1] which is an interesting thread. Even dns resolution on glibc implies dynamic linking due to nsswitch.

Solutions like Snap, Flatpak, and AppImage work to contain the problem by reusing the same abstractions internally rather than introducing anything that directly addresses the issue. We won't have a clean solution until we collectively abandon the FHS for a decentralized filesystem layout where adding an application (not just a program binary) is as easy as extracting a package into a folder and integrates with the rest of the system. I've worked on this off and on for a while, but being so opinionated makes everything an uphill battle while accepting the current reality is easy.

[1] https://musl.openwall.narkive.com/lW4KCyXd/static-linking-an...

replies(2): >>43552965 #>>43553118 #
mananaysiempre ◴[] No.43553118[source]
> Even dns resolution on glibc implies dynamic linking due to nsswitch.

Because, as far as I’ve heard, it borrowed that wholesale from Sun, who desperately needed an application to show off their new dynamic linking toy. There’s no reason they couldn’t’ve done a godsdamned daemon (that potentially dynamically loaded plugins) instead, and in fact making some sort of NSS compatibility shim that does work that way (either by linking the daemon with Glibc, or more ambitiously by reimplementing the NSS module APIs on top of a different libc) has been on my potential project list for years. (Long enough that Musl apparently did a different, less-powerful NSS shim in the meantime?)

The same applies to PAM word for word.

> Mixing static linking and dlopen doesn't make much sense, as said [in an oft-cited thread on the musl mailing list].

It’s a meh argument, I think.

It’s true that there’s something of a problem where two copies of a libc can’t coexist in a process, and that entails the problem of pulling in the whole libc that’s mentioned in the thread, but that to me seems more due to a poorly drawn abstraction boundary than anything else. Witness Windows, which has little to no problem with multiple libcs in a process; you may say that’s because most of the difficult-to-share stuff is in KERNEL32 instead, and I’d say that was exactly my point.

The host app would need to pull in a full copy of the dynamic loader? Well duh, but also (again) meh. The dynamic loader is not a trivial program, but it isn’t a huge program, either, especially if we cut down SysV/GNU’s (terrible) dynamic-linking ABI a bit and also only support dlopen()ing ELFs (elves?) that have no DT_NEEDED deps (having presumably been “statically” linked themselves).

So that thread, to me, feels like it has the same fundamental problem as Drepper’s standard rant[1] against static linking in general: it mixes up the problems arising from one libc’s particular implementation with problems inherent to the task of being a libc. (Drepper’s has much more of an attitude problem, of course.)

As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed. You might think this an academic fever dream, except that is how talking to the GPU works. Because of that, there’s basically no way to make a statically linked Linux GUI app that makes adequate use of a modern computer’s resources. (Even on a laptop with integrated graphics, using the CPU to shuttle pixels around is patently stupid and wasteful—by which I don’t mean you should never do it, just that there should be an alternative to doing it.)

Stretching the definitions a little, the in-proc part of a GPU driver is a very very smart RPC shim, and that’s not the only useful kind: medium-smart RPC shims like KERNEL32 and dumb ones like COM proxy DLLs and the Linux kernel’s VDSO are useful to dynamically load too.

And then there are plugins for stuff that doesn’t really want to pass through a bytestream interface (at all or efficiently), like media format support plugins (avoided by ffmpeg through linking in every media format ever), audio processing plugins, and so on.

Note that all of these intentionally have a very narrow waist[2] of an interface, and when done right they don’t even require both sides to share a malloc implementation. (Not a problem on Windows where there’s malloc at home^W^W^W a shared malloc in KERNEL32; the flip side is the malloc in KERNEL32 sucks ass and they’re stuck with it.) Hell, some of them hardly require wiring together arbitrary symbols and would be OK receiving and returning well-known structs of function pointers in an init function called after dlopen.

[1] https://www.akkadia.org/drepper/no_static_linking.html

[2] https://www.oilshell.org/blog/2022/02/diagrams.html

replies(2): >>43553812 #>>43559388 #
1. BwackNinja ◴[] No.43553812[source]
> The same applies to PAM word for word.

That's one of the reasons that OpenBSD is rather compelling. BSDAuth doesn't open arbitrary libraries to execute code, it forks and execs binaries so it doesn't pollute your program's namespace in unpredictable ways.

> It's true that there's something of a problem where two copies of a libc can't coexist in a process...

That's the meat of this article. It goes beyond complaining about a relatable issue and talks about the work and research they've done to see how it can be mitigated. I think it's a neat exercise to wonder how you could restructure a libc to allow multi-libc compatibility, but question why anyone would even want to statically link to libc in a program that dlopen's other libraries. If you're worried about a stable ABI with your libc, but acknowledge that other libraries you use link to a potentially different and incompatible libc thus making the problem even more complicated, you should probably go the BSDAuth route instead of introducing both additional complexity and incompatibility with existing systems. I think almost everything should be suitable for static linking and that Drepper's clarification is much more interesting than the rant. Polluting the global lib directory with a bunch of your private dependencies should be frowned upon and hides the real scale of applications. Installing an application shouldn't make the rest of your system harder to understand, especially when it doesn't do any special integration. When you have to dynamically link anyway:

> As for why you’d actually want to dlopen from a static executable, there’s one killer app: exokernels, loading (parts of) system-provided drivers into your process for speed.

If you're dealing with system resources like GPU drivers, those should be opaque implementations loaded by intermediaries like libglvnd. [1] This comes to mind as even more reason why dynamic dependencies of even static binaries are terrible. The resolution works, but it would be better if no zlib symbols would leak from mesa at all (using --exclude-libs and linking statically) so a compiled dependency cannot break the program that depends on it. So yes, I agree that dynamic dependencies of static libraries should be static themselves (though enforcing that is questionable), but I don't agree that the libc should be considered part of that problem and statically linked as well. That leads us to:

> ... when done right they don't even require both sides to share a malloc implementation

Better API design for libraries can eliminate a lot of these issues, but enforcing that is much harder problem in the current landscape where both sides are casually expected to share a malloc implementation -- hence the complication described in the article. "How can we force everything that exists into a better paradigm" is a lot less practical of a question than "what are the fewest changes we'd need to ensure this would work with just a recompile". I agree with the idea of a "narrow waist of an interface", but it's not useful in practice until people agree where the boundary should be and you can force everyone to abide by it.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28...