(damek.github.io)

338 points ibobev | 1 comments | 24 Jun 25 12:15 UTC | HN request time: 0.206s | source

1. saagarjha ◴[27 Jun 25 09:30 UTC] No.44395271[source]▶

> The “Peak Compute” roof of 19.5 TFLOPS is an ideal, achievable only with highly optimized instructions like Tensor Core matrix multiplications and high enough power limits.

As mentioned below, 19.5 TFLOPS is the FP32 compute roofline, which doesn't support Tensor Cores. If you want to use those you need to use FP16 and you can get substantially improved performance.

↑

Basic Facts about GPUs