←back to thread

138 points shipp02 | 2 comments | | HN request time: 0s | source
Show context
narrowbyte ◴[] No.40648378[source]
quite interesting framing. A couple things have changed since 2011

- SIMD (at least intel's AVX512) does have usable gather/scatter, so "Single instruction, multiple addresses" is no longer a flexibility win for SIMT vs SIMD

- likewise for pervasive masking support and "Single instruction, multiple flow paths"

In general, I think of SIMD as more flexible than SIMT, not less, in line with this other post https://news.ycombinator.com/item?id=40625579. SIMT requires staying more towards the "embarrassingly" parallel end of the spectrum, SIMD can be applied in cases where understanding the opportunity for parallelism is very non-trivial.

replies(3): >>40648477 #>>40648581 #>>40656815 #
raphlinus ◴[] No.40648581[source]
One of the other major things that's changed is that Nvidia now has independent thread scheduling (as of Volta, see [1]). That allows things like individual threads to take locks, which is a pretty big leap. Essentially, it allows you to program each individual thread as if it's running a C++ program, but of course you do have to think about the warp and block structure if you want to optimize performance.

I disagree that SIMT is only for embarrassingly parallel problems. Both CUDA and compute shaders are now used for fairly sophisticated data structures (including trees) and algorithms (including sorting).

[1]: https://developer.nvidia.com/blog/inside-volta/#independent_...

replies(3): >>40649270 #>>40650486 #>>40651532 #
1. narrowbyte ◴[] No.40650486{3}[source]
I intentionally said "more towards embarrassingly parallel" rather than "only embarrassingly parallel". I don't think there's a hard cutoff, but there is a qualitative difference. One example that springs to mind is https://github.com/simdjson/simdjson - afaik there's no similarly mature GPU-based JSON parsing.
replies(1): >>40651010 #
2. raphlinus ◴[] No.40651010[source]
I'm not aware of any similarly mature GPU-based JSON parser, but I believe such a thing is possible. My stack monoid work [1] contains a bunch of ideas that may be helpful for building one. I've thought about pursuing that, but have kept focus on 2D graphics as it's clearer how that will actually be useful.

[1]: https://arxiv.org/abs/2205.11659