I wonder how useful this will be for the modest but still multicore systems used for firewalls.
I have a retired mid-2010s Celeron platform which managed about 300 Mbit/s on OpenBSD 7.1/7.2. With OpenBSD 7.6 it reached well over 700 MBit/sec. I also have an early 2020s Atom platform which saturates its 2.5GbE interface without any problems. Not all of the network drivers perform equally but the network stack improvements have all the same made them take pretty big leaps.
Then again, the sentence "tcp is outside of global lock" is very generalized, there are so many parts that got out of the kernel lock in pieces, like ip input, routing lookups and device packet handling that it is hard to talk about it as one singular thing that you just flip a switch on to make it MP-performant.
You could make filesystem code mp, disk device drivers mp and then still run on an IDE-disk which forces all IO to be one at a time and serialized first-come-first-served at which point all the work was for 'nothing'.
Same goes for networking, there are many many layers and places that all need code that actually allows for MP processing to improve its performance, fine grained locks (which reduce perf at this stage), then prove that the fine grained locks are sufficient for ALL use cases, all kinds of layering violations that could possibly happen, then you can unlock this single layer, and move to the next if nothing acts up on any machine.
) https://www.youtube.com/live/wEM-E-IJ6sY?si=X3lLX9tEIO2mcEJl...