←back to thread

204 points WithinReason | 7 comments | | HN request time: 0s | source | bottom
Show context
mrweasel ◴[] No.40715746[source]
An once that becomes generally available operating systems will eat the bandwidth in an instance and any speed-up to be gained on a desktop will be completely negated.

It seems like we're stuck at a pre-set level of latency, which is just within what people tolerate. I was watching a video of someone running Windows 3.11 and notice that the windows closes instantly, which on Windows 10 and 11 I've never seen there NOT be a small delay between the user clicking close and the window disappearing.

replies(5): >>40715815 #>>40716021 #>>40716089 #>>40716389 #>>40717169 #
vladvasiliu ◴[] No.40715815[source]
> which on Windows 10 and 11 I've never seen there NOT be a small delay between the user clicking close and the window disappearing.

Isn't that delay related to the default animations? On my particular machine with animations disabled, if I click the minimize button, the window disappears instantly. This is your standard win11 on a shitty enterprise laptop running some kind of 11th gen i7u with the integrated graphics and a 4k external display.

Maximization is sometimes janky, but I guess it's because the window needs to redraw its contents at the new size.

replies(1): >>40715875 #
PlutoIsAPlanet ◴[] No.40715875[source]
Modern operating systems render to buffers on the GPU and then composite them, which I would guess adds some latency (although likely unnoticeable).
replies(1): >>40716455 #
LoganDark ◴[] No.40716455[source]
It's not unnoticeable. Ever notice how on Windows, when you start to drag a window, your cursor disappears for a frame? That's Windows replacing your cursor with a software-rendered one so it doesn't appear ahead of the window. But drag anything else (i.e. browser tabs, text highlighting) and you'll quickly notice it lagging behind the cursor. Why? Because the cursor is a hardware overlay that can be moved before the composition is actually complete. The composition lags one frame behind. In other words, the price of the compositor is lagging one frame behind. That may not sound like much, but it is, especially when most displays are only 60 FPS.

Of course, it's only one of the contributing factors to the total latency of things like keystrokes: https://danluu.com/input-lag/

replies(2): >>40717055 #>>40717112 #
1. arghwhat ◴[] No.40717055[source]
Unless the issue is that your setup cannot composite at 60 fps (don’t get me wrong, not pretending that Windows isn’t at fault if that’s the case), then neither double buffering nor software cursors introduce delay.

Unless your goal is tearing updates (a whole other discussion), then your only cause of latency is missed frame deadlines due to slow or badly scheduled rendering.

There is no need to switch to software cursor rendering unless you want to render something incompatible with the cursor plane, e.g. massive buffers or underlaying the cursor under another surface. Synchronization with primary plane updates is not at all an issue.

replies(1): >>40717350 #
2. LoganDark ◴[] No.40717350[source]
> Synchronization with primary plane updates is not at all an issue.

While I wouldn't be surprised if this is technically true in a hardware sense, software-wise, Windows knows where the cursor is before it's finished rendering the rest of the screen, and updates the hardware layer that contains the cursor before rendering has finished.

replies(1): >>40718470 #
3. arghwhat ◴[] No.40718470[source]
> While I wouldn't be surprised if this is technically true in a hardware sense, software-wise, Windows knows where the cursor is before it's finished rendering the rest of the screen

The earlier you sample the cursor position and update the cursor plane, the more the position is out of date once the next scanout comes around, increasing the perceived input delay.

The approach that leads to the smallest possible input latency is to sample the cursor position just before issuing the transaction that updates the cursor position and swaps in the new primary plane buffer (within Linux, this is called an atomic commit), whereas you maximize content consistency with still very good input latency by sampling just before the composition started.

Note that "composition" does not involve rendering "content" as the user perceives it, but just placing and blending already rendered window content, possibly with a color transform applied as the pixels hit the screen. Unless Microsoft is doing something weird, this should be extremely fast. <1ms fast.

replies(1): >>40719705 #
4. LoganDark ◴[] No.40719705{3}[source]
> The earlier you sample the cursor position and update the cursor plane, the more the position is out of date once the next scanout comes around, increasing the perceived input delay.

No, the cursor position is more up-to-date than the rest of the screen because it doesn't need to wait for a GPU pipeline to finish after it's moved.

> Unless Microsoft is doing something weird, this should be extremely fast. <1ms fast.

Look, I'm saying this is what's going on. (not to scale)

    ... | vsync                                                         ...
    ...  | cursor updated for frame 0                                   ...
    ...   | frame 0 scanout                                             ...
    ...     | frame 1 ready                                             ...
    ...                                   | vsync                       ...
    ...                                    | cursor updated for frame 1 ...
    ...                                     | frame 1 scanout           ...
    ...                                       | frame 2 ready           ...
Frames are extremely fast to render, but they arrive the frame after they were originally scheduled, because GPU pipelines are asynchronous. However, the cursor position arrives immediately because the position of the hardware layer can be synchronously updated immediately before scanout. The effect is that updates to the cursor position are (essentially) displayed 1 frame sooner than updates to the rest of the screen. If you actually try any of the tests I mentioned in my original comment you'll see this for yourself.
replies(2): >>40720660 #>>40727219 #
5. Dylan16807 ◴[] No.40720660{4}[source]
> Unless Microsoft is doing something weird, this should be extremely fast. <1ms fast.

And it should also be scheduled for near the end of the frame period, not happening right at the start.

But all this stuff is hard to do right and higher refresh rates make it simpler to do a good job.

6. arghwhat ◴[] No.40727219{4}[source]
Pekka Paalanen wrote a nice blogpost about the concept of repaint scheduling with graphs: https://ppaalanen.blogspot.com/2015/02/weston-repaint-schedu... (note that the Weston examples gives a whopping 7ms for composition).

I'm making some assumptions about your chart as it is not to scale, but it looks like the usual worst-case strategy. Given a 60Hz refresh rate and a 1ms composition time an example of an optimized composition strategy would look something like this:

    +0ms      vblank, frame#-1 starts scanout

    +15.4ms   read cursor position #0, initiate composite #0
    +16.4ms   composition buffer #0 ready
    +16.5ms   update cursor plane position #0 and attach primary plane buffer #0
    +16.6ms   vblank, frame #0 starts scanout

    +32.1ms   read cursor position, initiate composite #1
    +33.1ms   composition buffer #1 ready
    +33.2ms   update cursor position and attach primary plane buffer #1
    +33.3ms   vblank, frame #1 starts scanout
In this case, both the composite and the cursor position is only 1.2ms old at the time the GPU starts scanning it out, and hardware vs. software cursor has no effect on latency. Moving the cursor update closer would make the cursor out of sync with the displayed content, which is not really worth it.

(Games and other fullscreen applications can have their render buffer directly scanned out to remove the composition delay and read input at their own pace for simulation reasons, and those applications tend to be the subject at hand when discussing single or sub-millisecond input latency optimizations.)

> Frames are extremely fast to render, but they arrive the frame after they were originally scheduled, because GPU pipelines are asynchronous.

The display block is synchronous. While render pipelines are asynchronous, that is not a problem - as long as the render task completes before the scanout deadline, the resulting buffer can be included in that immediate scanout. Synchronization primitives are also there when you need it, and high-priority and compute queues can be used if you are concerned that the composition task ends up delayed by other things.

Also note that the scanout deadline is entirely virtual - the display block honors whatever framebuffer you point a plane to at any point, we just try to only do that during vblank to avoid tearing.

> If you actually try any of the tests I mentioned in my original comment you'll see this for yourself.

While it might be fun to see if Microsoft screwed up their composition and paint scheduling, that does not change that it is not related to GPUs or the graphics stack itself. Working in the Linux display server space makes me quite comfortable in my understanding of GPU's display controllers.

replies(1): >>40734753 #
7. LoganDark ◴[] No.40734753{5}[source]
> that does not change that it is not related to GPUs or the graphics stack itself. Working in the Linux display server space makes me quite comfortable in my understanding of GPU's display controllers.

I didn't mean to suggest some sort of fundamental limitation in GPUs that makes it impossible to synchronize this. If you take a look at my previous comments, you'll see me explicitly pointing out that I'm talking about Windows, specifically, and I'm only using it as an example of how short a latency is still perceptible. How exactly that latency happens is almost certainly not a hardware issue, however, and I never meant to imply such.