←back to thread

186 points nserrino | 1 comments | | HN request time: 0.269s | source
Show context
turbo_wombat ◴[] No.45119741[source]
They are comparing unoptimized PyTorch inference, something you would never deploy on a device, to a model with custom kernels.

Yes, of course the model with custom kernels is faster, whether it's written by a human or an AI.

Generally, PyTorch inference is meant to be used during the training process, and when running metrics, not when deploying. When deployed, you should export to ONNX, and then compile the ONNX to the native format of the device.

If you aren't familiar with the pipeline for ML deployment, this is the equivalent of comparing interpreted code to compiled code.

replies(7): >>45119755 #>>45120488 #>>45120646 #>>45121096 #>>45121128 #>>45121957 #>>45132362 #
airforce1 ◴[] No.45121128[source]
> and then compile the ONNX to the native format of the device.

I'm assuming you are talking about https://github.com/onnx/onnx-mlir?

In your experience, how much faster is a "compiled" onnx model vs. using an onnx runtime?

replies(1): >>45121510 #
1. dapperdrake ◴[] No.45121510[source]
For other people reading this:

Back in the day TensorFlow had tfdeploy which compiled TensorFlow terms into NumPy matrix operations. Our synthetic tests saw speedups of factor 50.