Most active commenters
  • eqvinox(6)

←back to thread

204 points WithinReason | 15 comments | | HN request time: 1.652s | source | bottom
Show context
mrweasel ◴[] No.40715746[source]
An once that becomes generally available operating systems will eat the bandwidth in an instance and any speed-up to be gained on a desktop will be completely negated.

It seems like we're stuck at a pre-set level of latency, which is just within what people tolerate. I was watching a video of someone running Windows 3.11 and notice that the windows closes instantly, which on Windows 10 and 11 I've never seen there NOT be a small delay between the user clicking close and the window disappearing.

replies(5): >>40715815 #>>40716021 #>>40716089 #>>40716389 #>>40717169 #
1. eqvinox ◴[] No.40716089[source]
> It seems like we're stuck at a pre-set level of latency,

Bandwidth isn't latency, and PCIe 7.0 running as fast as 128 GT/s is no statement at all about its latency. I remember this great analogy from university: a truck carrying a full load of backup tapes across a country has amazing bandwidth but atrocious latency.

(I still agree with your sentiment, just PCIe is not one of the problems in this regard. The connection between bandwidth becoming available and being eaten up vs. latency is a red herring; it's all about properly engineering software for responsitivity.)

replies(2): >>40716402 #>>40716632 #
2. szundi ◴[] No.40716402[source]
If your Win27k startup is a 8k 120fps video of a butterfly transforming to a windows logo - then it is latency

Btw all bandwith is built to reduce latency, aren’t they. Bit philosophy heh

replies(2): >>40716686 #>>40716931 #
3. ZiiS ◴[] No.40716632[source]
GT/s is a measure of latency (not total system latency, but the bus itself is only adding 128 billionth of a second). In fact it does not say anything about bandwidth if you don't know how many bits in a transfer.
replies(2): >>40716746 #>>40716981 #
4. eqvinox ◴[] No.40716686[source]
No, neither of these are true. If Win27k startup is an 8k 120fps video, it is either latency or stutter if you don't have enough bandwidth. You can absolutely design a system with priorities set such that latency is above stutter-/drop-free playback, and if you do, the startup time will be unaffected by that bandwidth.

And, no, not all bandwidth is built to reduce latency. There is a lot of bulk, best-effort traffic - for example, YouTube and Netflix proactively distributing videos between datacenters across the world. (They totally do that before anyone ever clicks play, they have enough data to know what is likely to be needed where.)

The same applies to your YouTube/Netflix playback at home. It doesn't need to be low latency. The only effect of latency is a longer time between you clicking play and playback actually starting. From there onwards, you just need enough bandwidth to keep the buffer filled, and you can do that quite a bit ahead of reaching playback position. Latency is a real non-issue there.

Same locally for bulk copying files around. If your OS & FS is designed well, latency only shows up at the beginning of the operation. Most file systems were designed when data was on rotating rust, and that's dealt with readahead and the likes.

5. out_of_protocol ◴[] No.40716746[source]
Throughput != latency, and often is tradeoff to latency (e.g. if you send stuff in big batches, database can process 100k tx/sec, but one by one it's 1k tx/sec at most)
6. retrac ◴[] No.40716931[source]
Latency and bandwidth are often in tension. (And guaranteeing low latency can eat up a big chunk of theoretically available bandwidth, due to overhead.)

The canonical example is probably a dial-up modem or other slow link between two locations. The latency is under 1 second to send one byte over the modem. But it's probably faster to just ship a hard disk if you want to send 100 gigabytes from one location to the other, even though the latency might be hours or even days, until the first byte arrives.

In practice, you can send lots of tiny little packets with lots of overhead (but low latency) or you can send lots of big heavily buffered packets with low overhead (but with high latency).

This is why multiplayer game protocols often consist of a constant stream of tiny UDP packets containing events like "character moved 40 nits east at game time ..." or "character fired weapon at game time ...." Even a 10 kilobyte bulk state update is going to cost at least a few milliseconds, more probably tens or even hundreds of milliseconds over some wireless connection. And that's a very noticeable lag.

replies(1): >>40720712 #
7. eqvinox ◴[] No.40716981[source]
I'm sorry, but you're multiply wrong. First, a "transfer" is not a term in the PCIe spec; if anything, there's "transaction". But GT/s does not refer to transactions as you seem to be implying, and in fact "GT" does not have an assigned long form in the PCIe base specification. The term is introduced / defined like this:

| The primary Link attributes for PCI Express Link are:

| · The basic Link – PCI Express Link consists of dual unidirectional differential Links, implemented as a Transmit pair and a Receive pair. A data clock is embedded using an encoding scheme (see Chapter 4) to achieve very high data rates.

| · Signaling rate – Once initialized, each Link must only operate at one of the supported signaling levels. For the first generation of PCI Express technology, there is only one signaling rate defined, which provides an effective 2.5 Gigabits/second/Lane/direction of raw bandwidth. The second generation provides an effective 5.0 Gigabits/second/Lane/direction of raw bandwidth. The third generation provides an effective 8.0 Gigabits/second/Lane/direction of raw bandwidth. The data rate is expected to increase with technology advances in the future.

| · Lanes – A Link must support at least one Lane – each Lane represents a set of differential signal pairs (one pair for transmission, one pair for reception). To scale bandwidth, a Link may aggregate multiple Lanes denoted by xN where N may be any of the supported Link widths. A x8 Link operating at the 2.5 GT/s data rate represents an aggregate bandwidth of 20 Gigabits/second of raw bandwidth in each direction. This specification describes operations for x1, x2, x4, x8, x12, x16, and x32 Lane widths.

(from PCIe 4.0 base specification)

So, GT/s is used to be less ambiguous on multi-lane links.

Next,

> the bus itself is only adding 128 billionth of a second).

no, the bus does actually add more latency since almost all receivers need to reassemble the whole transaction (generally tens to hundreds of bytes) to checksum validate and then dispatch further to continue. This latency can show up multiple times if you have PCIe switches, but (unlike endpoints) these are frequently cut-through.

However, that latency is seriously negligible compared to anything else in your system.

> In fact it does not say anything about bandwidth if you don't know how many bits in a transfer.

How many bits are in a transaction does in fact influence that latency mentioned right above, but has no impact on bandwidth. What does have an impact on available end-user bandwidth is how small you chunk longer transactions since each of them has per-transaction overhead.

And finally —

> GT/s is a measure of latency

— absolutely not. It is a measure of raw bandwidth. It indirectly influences minimum and maximum latency, but those are complicated relationships especially on multi-lane links, and especially maximum latency depends on a whole host of factors from hardware capabilities, to BIOS and OS settings in PCIe config, to driver behavior.

replies(2): >>40717622 #>>40720198 #
8. vinay_ys ◴[] No.40717622{3}[source]
PCIe 1.0 to 5.0 used NRZ (non return to zero) electrical signaling for transmission. In NRZ signaling at high rate and over long distances, there are challenges w.r.t clock recovery, DC balance and error correction. To deal with this, encoding is used.

Encoding is basically a block of data (a sequence of zeros and ones) is represented as a sequence of electrical voltage changes (a block of symbols).

GT/s does stand for Giga Transfers per second. Here, the transfers are referring to number of symbols transferred per second, and not actual usable data bits per second.

We say GT/s instead of Gbps, because the actual usable bits/sec is determined by the encoding scheme used.

PCIe 1.0 and 2.0 encoded 8 data bits in 10 symbols (NRZ electrical signals). That's 20% overhead.

PCIe 3.0 to 5.0 encoded 128 bits of data in 130 symbols. That's a much lower overhead of 1.54%.

PCIe 6.0 (& yet to be standardized PCIe 7.0) use PAM4 for signaling and doesn't required any encoding on top –hence it is written as 1b/1b. (btw, in PAM4, each symbol is 2 bits).

You can see similar NRZ signaling with encoding in SATA, Ethernet, Fiber Channel etc. Btw, PAM4 with NRZ is used as well!

Coming to latency, latency is the time it takes for a single bit of usable data to transfer from A to B. Many factors affect this latency. Signaling medium's speed of transmission (a fraction of speed of light), signaling medium's length, signaling frequency (Mhz, Ghz etc of voltage switching), encoding scheme (encoding overhead, clock recovery or its failure and hence retransmissions, error detection/correction quality or its failure and hence retransmissions) - each of these things affect the latency of usable data.

GT/s = Signal Frequency x Bits per cycle.

Remember, PAM4 encoding in PCIe6.0 has 2 bits per cycle (2 bits per symbol).

replies(2): >>40719656 #>>40720056 #
9. Dylan16807 ◴[] No.40719656{4}[source]
> PCIe 6.0 (& yet to be standardized PCIe 7.0) use PAM4 for signaling and doesn't required any encoding on top –hence it is written as 1b/1b. (btw, in PAM4, each symbol is 2 bits).

Nothing on top if you exclude the error correction bits, which I don't think you should.

10. eqvinox ◴[] No.40720056{4}[source]
You're confusing GT and Gbaud. Gbaud/s = Symbol rate × bits per symbol.

GT/s is in fact G"bit"/s, before line coding (I left that can of worms unopened because line coding wasn't relevant to the bandwidth vs. latency discussion.) PCIe 6.0 is "64GT/s", but only 32 Gbaud, since as you correctly point out it uses PAM-4.

> GT/s does stand for Giga Transfers per second.

If you have a citable source for this, that'd be nice — it's not in the PCIe spec, and AFAIK the term is not used elsewhere.

replies(1): >>40722921 #
11. ZiiS ◴[] No.40720198{3}[source]
I do totally agree about the relative merits of bandwidth vs latency. However, I also still think GT/s is generally accepted as an abbreviation for gigatransfers per second and that the PCIe spec is assuming it as such. I also note you have had to pull in lots of additional specifications to describe the bandwidth of the complete Link supporting my assertion it is not a pure function of the GT/s.
replies(1): >>40726329 #
12. Dylan16807 ◴[] No.40720712{3}[source]
Another good example is the memory in your computer. DDR is much lower latency, and GDDR is much higher bandwidth.
13. vinay_ys ◴[] No.40722921{5}[source]
Here's a pcisig article talking about symbol rate in terms of GT/s:

https://pcisig.com/blog/pci-express®-50-architecture-channel...

replies(1): >>40726178 #
14. eqvinox ◴[] No.40726178{6}[source]
That's great, but…

https://pcisig.com/pci-express-6.0-specification "64 GT/s raw data rate and up to 256 GB/s via x16 configuration"

The symbol rate for 6.0 is only 32Gsym/s. So GT/s can't be symbol rate. (And the references to PCIe 6.0 putting it at "64 GT/s" seem to be far more common, and in particular the PCIe (4.0, newest I have access to) specification explicitly equates GT/s with data rate.

My takeaway (even before this discussion) is to avoid "GT/s" as much as possible since the unit is really not well defined.

(And, still, I don't even know if there is a definition of it anywhere. I can't find one. The PCIe spec uses it without defining it, but it is not a "textbook common" unit IMHO. If you are aware of an actual definition, or maybe even just a place¹ that can confirm the T is supposed to mean "transfer", I'd appreciate that!)

¹ yes I know wikipedia says that too, but their sources are… very questionable.

P.S.: I really don't even disagree with you, because ultimately I'm saying "GT/s is confusing and can be interpreted different ways". The links from each of us just straight up conflict with each other in their use of GT/s. Yours uses it for symbol rate, mine uses it for data rate. ⇒ why I try to avoid using this unit at all.

15. eqvinox ◴[] No.40726329{4}[source]
I think we're having some communication/understanding issues, but that's OK. To be clear my main issue with GT/s is that even the PCI SIG doesn't agree with itself and uses the term in conflicting ways (see discussion in sibling thread.)

As far as I can research, GT/s is a "commoner's unit" that someone invented and started using at some point, but there is no hard reliable definition of it. Nowadays it seems to be used for RAM and PCIe (and nothing else really), though some search results I found claim it was also used for SCSI.