←back to thread

78 points ibobev | 2 comments | | HN request time: 0s | source
Show context
whizzter ◴[] No.43557881[source]
I wrote an non-RTX on-GPU raytracer a while back (naive compared to this) and it's super-interesting to read about the advances in compressing BVH structures.

But the changes also highlights a change in focus from just implementing this naively(RDNA3 technically not too much removed from the naive raytracer I wrote) to moving it to something carefully engineered and optimized for memory bandwidth (with savings circuits even built into silicon?).

replies(1): >>43563214 #
1. ahartmetz ◴[] No.43563214[source]
Seems very likely that the hardware decompresses the data more or less on the fly. The acceleration structures are for the hardware, arithmetics hardware is cheap (compared to memory access), and they could use the compressed structures on older hardware with new drivers if hardware support wasn't necessary.
replies(1): >>43580881 #
2. whizzter ◴[] No.43580881[source]
Right, the point of raytracing extensions is that there can definitively be wins thanks to specialized circuitry.

What I do wonder, like you mention that older chips could probably use the more optimized structures via software (after all, my naive-ish raytracer is fully in OpenGL and could me modified to use these structures instead), with memory being the big pain-point, what hardware optimizations/specializations are most relevant to get big gains compared to what can be done in "microcode". Circuitry for triangle-intersections, bit-unpacking but considering stack management there's probably other parts left to microcode.