←back to thread

Basic Facts about GPUs

(damek.github.io)
338 points ibobev | 1 comments | | HN request time: 0.206s | source
1. saagarjha ◴[] No.44395271[source]
> The “Peak Compute” roof of 19.5 TFLOPS is an ideal, achievable only with highly optimized instructions like Tensor Core matrix multiplications and high enough power limits.

As mentioned below, 19.5 TFLOPS is the FP32 compute roofline, which doesn't support Tensor Cores. If you want to use those you need to use FP16 and you can get substantially improved performance.