←back to thread

107 points joouha | 3 comments | | HN request time: 0s | source

I've invented a new alternative to forking / vendoring / monkey-patching packages in Python.

It's a bit like OverlayFS for Python modules - it allows you write modifications for a target module (lower) in a new module (upper), and have these combined in a new virtual module (mount).

It works by rewriting imports using AST transformations, then running both the lower and upper module's code in the new Python module.

This prevents polluting the global namespace when monkey-patching, and means if you want to make changes to a third-party package, you don't have to take on the maintenance burden of forking, you can package and distribute just your changes.

Show context
nbadg ◴[] No.45666046[source]
For context: one of the several projects I'm working on right now is an automated extraction system for literate-code-style documentation in python. This isn't the place nor time to talk about the why of it (especially compared to other existing similar solutions). The important thing is the how: it uses a temporary import hook to stub out all module imports, allowing the docs generator to process each module independently at runtime, track imports between them, etc. At the end of the process, it also cleans itself up nicely.

Point being, it's a lot of really complicated fiddling with the python import system. And a lesson I have learned is that messing around with import internals in python is extremely tricky to get right. Furthermore, trying to coordinate correctly between modules that do and don't get modified my the hook is very finicky. Not to mention that supply side attacks on the import system itself could be a terrifying attack vector that would be absurdly difficult to detect.

All this to say, I'm not a big fan of monkeypatching, but I know exactly how it behaves, its edge cases, and what to expect if I do it. It is, after all, pretty standard practice to patch things during python unit tests. And even with all its warts, I would prefer patching to import fiddling any day of the week and twice on Sunday.

Feedback for the author: you need to explain the "why" of your project more thoroughly. I'm sure you had a good reason to strike out in this direction, and maybe this is a super elegant solution. But you've failed to explain to me under what circumstances I might also encounter the same problems with patching that you've encountered, in order to explain to me why the risk of an import hook is justified.

replies(4): >>45666315 #>>45666523 #>>45669215 #>>45673083 #
BiteCode_dev ◴[] No.45669215[source]
Monkey patching an object attribute, such as a method or a function of a module, may affect 3rd party libraries code that use said object.

This solution is interesting, as it provides the patched code as if it were a new package, indendant of the existing one you have installed, like vendoring, but without the burden of it.

In case you want to be the only one seing your patch, this is great. It also makes the whole maintenance easier, as you don't have to wonder if you patch it at the right time or in the right way. MK can fail in many subtle edge cases.

Inheritance, particularly, is a great Mk pitfall I expect this method to transparently work with.

replies(1): >>45669873 #
1. nbadg ◴[] No.45669873[source]
If you only want your own code to see the patch, then why not just wrap it?

I mean if you really need super strong isolation, you can always create a copy of the library object; metaprogramming, dynamic classes, etc, all make it really easy to even, say, create a duplicate class object with references to the original method implementations. Or decorated ones. Or countless other approaches.

My point isn't that I don't see problems that could be solved by this; my point is that I can't think of any problems that this solves, that wouldn't be better solved by things that don't do any innards-fiddling in what is arguably the most sharply-edged part of python: packaging and imports.

And speaking from experience... if you think patching can fail in subtle edge cases, then I've got some bad news for you re: import hooks.

At the end of the day, people who might use this library are looking for a solution to a particular problem. When documenting things, it's really important to be explicit about the pros and cons of your solution, from the perspective of someone with a particular problem, and not from the perspective of someone who's built a particular solution. If I need to drive a nail, and you're selling wrenches, I don't want to hear about all of the features of your wrenches; I want to know if your wrench can drive my nail, and why I would ever want to choose it instead of a hammer.

I can think of a lot of differently-shaped metaphorical nails that fall under the broad umbrella of "I need to change some upstream code but don't want to maintain a fork". And I can think of a whole lot of python-specific specialty hammers that can accomplish that task. But I still can't think of a signle situation where using import hooks to solve the problem is doing anything other than throwing a wrench into a very delicate gearbox. That is the explanation I would need, if I were in the market for such a solution, to evaluate modshim as a potential approach.

replies(1): >>45675636 #
2. Izkata ◴[] No.45675636[source]
> I mean if you really need super strong isolation, you can always create a copy of the library object; metaprogramming, dynamic classes, etc, all make it really easy to even, say, create a duplicate class object with references to the original method implementations. Or decorated ones. Or countless other approaches.

> My point isn't that I don't see problems that could be solved by this; my point is that I can't think of any problems that this solves, that wouldn't be better solved by things that don't do any innards-fiddling in what is arguably the most sharply-edged part of python: packaging and imports.

All these examples have the dependency order wrong, and you're right on those - it's simpler to wrap them somehow. But this is doing something different, that is either much harder or outright impossible with those methods: Tweaking something internal to the module while leaving its interface alone. This is shown in both their examples where they modify the TextWrapper object but then use it through the library's wrap() function, and modify the Session object but then just use the standard get() interface to requests.

replies(1): >>45679590 #
3. nbadg ◴[] No.45679590[source]
In that case I'd opt for dynamic module creation using metaprogramming instead of an import hook. And I personally would argue that grabbing the code objects from module members and re-execing them into a new module object to re-bind their globals is simpler than an AST transformation.

But regardless of the transformation methodology: the import hook itself is just a delivery mechanism for the modified code. There's nothing stopping the library from using the same transformation mechanism but accessing it with dynamic programming techniques instead of an import hook. And there's nothing you can't do that way.