←back to thread

SIMD < SIMT < SMT: Parallelism in Nvidia GPUs (2011)

(yosefk.com)

138 points shipp02 | 1 comments | 10 Jun 24 06:05 UTC | HN request time: 0.21s | source

Show context

narrowbyte ◴[11 Jun 24 16:37 UTC] No.40648378[source]▶

>>40630676 (OP) #

quite interesting framing. A couple things have changed since 2011

- SIMD (at least intel's AVX512) does have usable gather/scatter, so "Single instruction, multiple addresses" is no longer a flexibility win for SIMT vs SIMD

- likewise for pervasive masking support and "Single instruction, multiple flow paths"

In general, I think of SIMD as more flexible than SIMT, not less, in line with this other post https://news.ycombinator.com/item?id=40625579. SIMT requires staying more towards the "embarrassingly" parallel end of the spectrum, SIMD can be applied in cases where understanding the opportunity for parallelism is very non-trivial.

replies(3): >>40648477 #>>40648581 #>>40656815 #

1. ribit ◴[12 Jun 24 11:25 UTC] No.40656815[source]▶

Modern GPUs are exposing the SIMD behind the SIMT model and heavily investing into SIMD features such as shuffles, votes, and reduces. This leads to an interesting programming model. One interesting challenge is that flow control is done very differently on different hardware. AMD has a separate scalar instruction pipeline which can set the SIMD mask. Apple uses an interesting per-lane stack counter approach where value of zero means that the lane is active and non-zero value indicates how many blocks need to be exited for the thread to become active again. Not really sure how Nvidia does it.