Is this about kernel tasks, user tasks or both?
Why isn't the level of preemption a property of the specific event, rather than of some global mode? Some events need to be handled with less latency than others.
Though when it comes to gaming, there is a delicate balance as game performance should be prioritized but not be allowed to cause the system to lock up for multitasking purposes.
Either way, considering this is mostly for idle tasks. It has little importance to allow it to be automated beyond giving users a simple command for scripting purposes that users can use for toggling various behaviors.
> There is also, of course, the need for extensive performance testing; Mike Galbraith has made an early start on that work, showing that throughput with lazy preemption falls just short of that with PREEMPT_VOLUNTARY.
Sounds promising. Just like EEVDF, this both simplifies and improves the status quo. Does not get better than that.
To stand ready to reliably respond to any one kind of event with low latency, every CPU intensive program must suffer a performance penalty all the time. And this is true no matter how rare those events may be.
But yeah thanks for making that distinction. Forgot to touch on the differences
There are two different notions which are easy to get confused about here: when a process can be preempted and when a process will actually be preempted.
Potential preemption point is a property of the scheduler and is what is being discussed with the global mode here. More preemption points mean more chances for processes to be preempted at inconvenient time obviously but it also means more chances to properly prioritise.
What you call level of preemption, which is to say priority given by the scheduler, absolutely is a property of the process and can definitely be set. The Linux default scheduler will indeed do its best to allocate more time slices and preempt less processes which have priority.
Not necessarily. The CPU can do it in hardware. As a simple example, the 6502 had separate “interrupt request” (IRQ) and “non-maskable interrupts (NMI) pins, supporting two interrupt levels. The former could be disabled; the latter could not.
A programmable interrupt controller (https://en.wikipedia.org/wiki/Programmable_interrupt_control...) also could ‘know’ that it need not immediately handle some interrupts.
If one wanted to drastically simplify scheduler, for example for some scientific application which doesn't care about preemption at all, can it be done in clean, modular way? And will be any benefit?
Do no syscalls. Timer tick. Kernel takes over and does whatever as well.
No_HZ_FULL, isolated cpu cores, interrupts on some other core and you can spin using 100% cpu forever on a core. Do games do anything like this?
I haven't heard of it being done with PC games. I doubt the environment would be predictable enough. On consoles tho..?
I guess it's nice to keep Linux relevant to older single CPU architectures, especially with regards to embedded systems.
But if Linux is going to be targeted towards modern cpu architectures primarily, accidentally basically assume that there is a a single CPU available to evaluate priority and leave the CPU intensive task bound to other cores?
I mean this has to be what high low is for, outside of mobile efficiency.
you gotta balance precision (targeted leads) with efficiency so everything runs smoothly overall.
The standard way is to set interrupt masks so they don't go to "work" cpus and use cpusets to only allow specific cgroup to execute on given cpuset.
edit: That having been said, I may be misinterpreting what you described; there's a comment in another thread by @zeusk which says to me that more or less this (single core used/reserved for making priority decisions) is already the case on many multi-core systems anyway, thanks to IPI (inter-processor interrupts). So, presumably, the prioritization core handles the preemption interrupts, then runs decision logic on what threads actually need to be preempted, and sends those decisions out to the respective core(s) using IPI, which causes the kernel code on those cores to unconditionally preempt the running thread.
However, I'd wonder still about the risk of memory barriers or locks starving out the kernel scheduler in this kind of architecture. Maybe the CPU can arbitrate the priority for these in hardware? Or maybe the kernel scheduler always runs for a small portion of every time slice, but only takes action if an interrupt handler has set a flag?
By the way, NMI still exist on x86 to this day, but AFAIK they're only used for serious machine-level issues and watchdog timeouts.
Whatever the scheduler does should be pretty low impact, because the runlist will be very short. If your application doesn't do much I/O, you won't get many interrupts either. If you can run a tickless kernel (is that still a thing, or is it normal now?), you might not get any interrupts for large periods.
How do you know which thread is needed to "handle" this particular "event" though? I mean, maybe you're about to start a high priority video with low latency requirements[1]. And due to a design mess your video player needs to contact some random auth server to get a DRM cookie for the stream.
How does the KERNEL know that the auth server is on the critical path for the backup camera? That's a human-space design issue, not a scheduler algorithm.
[1] A backup camera in a vehicle, say.
But the reason for drastically simplifying it would be to avoid bugs, there isn't much performance to gain compared to a well-set default one (there are plenty of settings tough). And there haven't been many bugs there. On most naive simplifications you will lose performance, not gain it.
If you are running a non-interactive system, the easiest change to make is to increase the size of the process time quantum.
Generally, any given software can be done in hardware.
Specifically, we could attach small custom coprocessors to everything for the Linux kernel, and Linux could require them to do any sort of multitasking.
In practice, software allows us to customize these things and upgrade them and change them without tightly coupling us to a specific kernel and hardware design.
Even for non-rendering systems those still usually run at game tick-rates since running those full-tilt can starve adjacent cores depending on false sharing, cache misses, bus bandwidth limits and the like.
I can't think of a single title I worked on that did what you describe, embedded stuff for sure but that's a whole different class that is likely not even running a kernel.
From what I recall we mostly did it for predictability so that things that may go long wouldn't interrupt deadline sensitive things(audio, physics, etc).
Software PLCs will bind to a core which is not exposed to the OS environment and will show a dual core is a single or a quad core as a tri core.
> SCHED_IDLE, SCHED_BATCH and SCHED_NORMAL/OTHER get the lazy thing, FIFO, RR and DEADLINE get the traditional Full behaviour.
This doesn't mean that moving logic into hardware can't be a win. It often is. But we should also expect that what has tended to wind up in software, will continue to do so in the future. And that includes complex decisions about the priority of interrupts.
Wait, what? I’ve been out of compiler design for a couple decades, but that definitely used to be a thing.