Even more fun: allocating memory could trigger Python's garbage collector which would also run `__del_-` functions. So every allocation was also a possible (but rare) thread switch.
The GIL was only ever intended to protect Python's internal state (esp. the reference counts themselves); any extension modules assuming that their own state would also be protected were likely already mistaken.
> A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval(). Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.
https://docs.python.org/3/faq/library.html#what-kinds-of-glo...
If this is not the case please let the official python team know their documentation is wrong. It indeed does state that if Py_DECREF is invoked the bets are off. But a ton of operations never do that.