When running int8 matmul using onnx performance is ~0.6TF.
https://github.com/usefulsensors/qc_npu_benchmark