←back to thread

138 points shipp02 | 1 comments | | HN request time: 0.286s | source
Show context
narrowbyte ◴[] No.40648378[source]
quite interesting framing. A couple things have changed since 2011

- SIMD (at least intel's AVX512) does have usable gather/scatter, so "Single instruction, multiple addresses" is no longer a flexibility win for SIMT vs SIMD

- likewise for pervasive masking support and "Single instruction, multiple flow paths"

In general, I think of SIMD as more flexible than SIMT, not less, in line with this other post https://news.ycombinator.com/item?id=40625579. SIMT requires staying more towards the "embarrassingly" parallel end of the spectrum, SIMD can be applied in cases where understanding the opportunity for parallelism is very non-trivial.

replies(3): >>40648477 #>>40648581 #>>40656815 #
majke ◴[] No.40648477[source]
Last time i looked at intel scatter/gather I got the impression it only works for a very narrow use case, and getting it to perform wasn’t easy. Did I miss something?
replies(2): >>40648681 #>>40655793 #
1. majke ◴[] No.40655793[source]
vpgatherdd - I think that for newer CPUs it is faster than many loads + inserts, but if you are going to fault a lot, then it becomes slow.

> The VGATHER instructions are implemented as micro-coded flow. Latency is ~50 cycles.

https://www.intel.com/content/www/us/en/content-details/8141...