This is neat, and it's great that Go provides such simple access to low-level primitives.
But for the particular case of SIMD operations, wouldn't it make more sense to use the GPU instead of the CPU? GPUs excel at parallelism and matrix operations, so the performance difference would be even greater. I suppose the lack of well maintained GPU packages and community around it don't make Go particularly well suited for this.
replies(1):