Most active commenters

btilly(3)
wizzwizz4(3)

The long road to lazy preemption in the Linux CPU scheduler

(lwn.net)

Show context

amelius ◴[19 Oct 24 10:19 UTC] No.41886918[source]▶

> A higher level of preemption enables the system to respond more quickly to events; whether an event is the movement of a mouse or an "imminent meltdown" signal from a nuclear reactor, faster response tends to be more gratifying. But a higher level of preemption can hurt the overall throughput of the system; workloads with a lot of long-running, CPU-intensive tasks tend to benefit from being disturbed as little as possible. More frequent preemption can also lead to higher lock contention. That is why the different modes exist; the optimal preemption mode will vary for different workloads.

Why isn't the level of preemption a property of the specific event, rather than of some global mode? Some events need to be handled with less latency than others.

replies(7): >>41887042 #>>41887125 #>>41887151 #>>41887348 #>>41887690 #>>41888316 #>>41889749 #

1. btilly ◴[19 Oct 24 11:11 UTC] No.41887151[source]▶

>>41886918 #

You need CPU time to evaluate the priority of the event. This can't happen until after you've interrupted whatever process is currently on the CPU. And so the highest possible priority an event can happen is limited by how short a time slice a program gets before it has to go through a context switch.

To stand ready to reliably respond to any one kind of event with low latency, every CPU intensive program must suffer a performance penalty all the time. And this is true no matter how rare those events may be.

replies(3): >>41887336 #>>41887387 #>>41887493 #

2. zeusk ◴[19 Oct 24 12:01 UTC] No.41887336[source]▶

>>41887151 (TP) #

That is not true of quite a few multi-core systems. A lot of them, especially those that really care about performance will strap all interrupts to core 0 and only interrupt other cores via IPI when necessary.

replies(2): >>41888253 #>>41889577 #

3. Someone ◴[19 Oct 24 12:19 UTC] No.41887387[source]▶

>>41887151 (TP) #

> You need CPU time to evaluate the priority of the event.

Not necessarily. The CPU can do it in hardware. As a simple example, the 6502 had separate “interrupt request” (IRQ) and “non-maskable interrupts (NMI) pins, supporting two interrupt levels. The former could be disabled; the latter could not.

A programmable interrupt controller (https://en.wikipedia.org/wiki/Programmable_interrupt_control...) also could ‘know’ that it need not immediately handle some interrupts.

replies(1): >>41887806 #

4. amluto ◴[19 Oct 24 12:51 UTC] No.41887493[source]▶

>>41887151 (TP) #

Linux runs actual C code when an event occurs — this is how it queues up a wake up of the target task and optionally triggers preemption.

5. themulticaster ◴[19 Oct 24 13:58 UTC] No.41887806[source]▶

>>41887387 #

The user you replied to likely means something different: The priority of the event often depends on the exact contents on the event and not the hardware event source. For example, say you receive a "read request completed" interrupt from a storage device. The kernel now needs to pass on the data to the process which originally requested it. In order to know how urgent the original request and thus the handling of the interrupt is, the kernel needs to check which sector was read and associate it with a process. Merely knowing that it came from a specific storage device is not sufficient.

By the way, NMI still exist on x86 to this day, but AFAIK they're only used for serious machine-level issues and watchdog timeouts.

replies(1): >>41888323 #

6. xedrac ◴[19 Oct 24 15:11 UTC] No.41888253[source]▶

>>41887336 #

I learned this when I pegged core 0 with an intensive process on a little quad core arm device, and all of my interrupts started behaving erratically.

7. wizzwizz4 ◴[19 Oct 24 15:21 UTC] No.41888323{3}[source]▶

>>41887806 #

This, too, can be done in hardware (if nothing else, with a small coprocessor).

replies(1): >>41888464 #

8. refulgentis ◴[19 Oct 24 15:46 UTC] No.41888464{4}[source]▶

>>41888323 #

This doesn't shed light

Generally, any given software can be done in hardware.

Specifically, we could attach small custom coprocessors to everything for the Linux kernel, and Linux could require them to do any sort of multitasking.

In practice, software allows us to customize these things and upgrade them and change them without tightly coupling us to a specific kernel and hardware design.

replies(2): >>41888695 #>>41889802 #

9. wizzwizz4 ◴[19 Oct 24 16:17 UTC] No.41888695{5}[source]▶

>>41888464 #

We already have specialised hardware for register mapping (which could be done in software, by the compiler, but generally isn't) and resolving instruction dependency graphs (which again, could be done by a compiler). Mapping interrupts to a hardware priority level feels like the same sort of task, to me.

replies(1): >>41889832 #

10. btilly ◴[19 Oct 24 18:25 UTC] No.41889577[source]▶

>>41887336 #

This strategy minimizes the impact by making one core less necessary. But it does not eliminate it.

replies(1): >>41890033 #

11. btilly ◴[19 Oct 24 18:55 UTC] No.41889802{5}[source]▶

>>41888464 #

Exactly the point. We can compile any piece of software that we want into hardware, but after that it is easier to change in software. Given the variety of unexpected ways in which hardware is used, in practice we went up moving some of what we expected to do in hardware, back into software.

This doesn't mean that moving logic into hardware can't be a win. It often is. But we should also expect that what has tended to wind up in software, will continue to do so in the future. And that includes complex decisions about the priority of interrupts.

12. sroussey ◴[19 Oct 24 18:59 UTC] No.41889832{6}[source]▶

>>41888695 #

> We already have specialised hardware for register mapping (which could be done in software, by the compiler, but generally isn't)

Wait, what? I’ve been out of compiler design for a couple decades, but that definitely used to be a thing.

replies(2): >>41890021 #>>41898207 #

13. namibj ◴[19 Oct 24 19:22 UTC] No.41890021{7}[source]▶

>>41889832 #

They're probably referring to AMD Zen's speculative lifting of stack slots into physical registers (due to x86, phased out with Zen3 though), and more generally to OoO cores with far more physical than architectural registers.

14. zeusk ◴[19 Oct 24 19:24 UTC] No.41890033{3}[source]▶

>>41889577 #

Sure which is a perfectly fine trade off; almost all recent CPUs have enough multicore capacity that make this trade favorable.

15. wizzwizz4 ◴[20 Oct 24 20:22 UTC] No.41898207{7}[source]▶

>>41889832 #

We do register allocation in compilers, yes, but that has surprisingly little bearing on the actual microarchitectural register allocation. The priority when allocating registers these days is, iirc, avoiding false dependencies, not anything else.

↑