Right, the point of raytracing extensions is that there can definitively be wins thanks to specialized circuitry.
What I do wonder, like you mention that older chips could probably use the more optimized structures via software (after all, my naive-ish raytracer is fully in OpenGL and could me modified to use these structures instead), with memory being the big pain-point, what hardware optimizations/specializations are most relevant to get big gains compared to what can be done in "microcode". Circuitry for triangle-intersections, bit-unpacking but considering stack management there's probably other parts left to microcode.