←back to thread

138 points shipp02 | 1 comments | | HN request time: 0.4s | source
Show context
Remnant44 ◴[] No.40649679[source]
I think the principle things that have changed since this article was written is mostly each category taking inspiration from the other.

For example, SIMD instructions gained gather/scatter and even masking of instructions for divergent flow (in avx512 that consumers never get to play with). These can really simplify writing explicit SIMD and make it more GPU-like.

Conversely, GPUs gained a much higher emphasis on caching, sustained divergent flow via independent program counters, and subgroup instructions which are essentially explicit SIMD in disguise.

SMT on the other hand... seems like it might be on the way out completely. While still quite effective for some workloads, it seems like quite a lot of complexity for only situational improvements in throughput.

replies(2): >>40650596 #>>40652977 #
yosefk ◴[] No.40650596[source]
The basic architecture still matters. GPUs still lose throughput upon divergence regardless of their increased ability to run more kinds of divergent flows correctly due to having separate PCs, and SIMD still has more trouble with instruction latency (including due to bank conflict resolution in scatter/gather) than barrel threaded machines, etc. This is not to detract from the importance of the improvements to the base architecture made over time
replies(1): >>40650660 #
1. Remnant44 ◴[] No.40650660[source]
agreed! The basic categories remain, just blurring a bit at the edges.