←back to thread

138 points shipp02 | 1 comments | | HN request time: 0.21s | source
Show context
narrowbyte ◴[] No.40648378[source]
quite interesting framing. A couple things have changed since 2011

- SIMD (at least intel's AVX512) does have usable gather/scatter, so "Single instruction, multiple addresses" is no longer a flexibility win for SIMT vs SIMD

- likewise for pervasive masking support and "Single instruction, multiple flow paths"

In general, I think of SIMD as more flexible than SIMT, not less, in line with this other post https://news.ycombinator.com/item?id=40625579. SIMT requires staying more towards the "embarrassingly" parallel end of the spectrum, SIMD can be applied in cases where understanding the opportunity for parallelism is very non-trivial.

replies(3): >>40648477 #>>40648581 #>>40656815 #
1. ribit ◴[] No.40656815[source]
Modern GPUs are exposing the SIMD behind the SIMT model and heavily investing into SIMD features such as shuffles, votes, and reduces. This leads to an interesting programming model. One interesting challenge is that flow control is done very differently on different hardware. AMD has a separate scalar instruction pipeline which can set the SIMD mask. Apple uses an interesting per-lane stack counter approach where value of zero means that the lane is active and non-zero value indicates how many blocks need to be exited for the thread to become active again. Not really sure how Nvidia does it.