(gimletlabs.ai)

186 points nserrino | 1 comments | 03 Sep 25 17:03 UTC | HN request time: 0s | source

Show context

turbo_wombat ◴[03 Sep 25 19:48 UTC] No.45119741[source]▶

They are comparing unoptimized PyTorch inference, something you would never deploy on a device, to a model with custom kernels.

Yes, of course the model with custom kernels is faster, whether it's written by a human or an AI.

Generally, PyTorch inference is meant to be used during the training process, and when running metrics, not when deploying. When deployed, you should export to ONNX, and then compile the ONNX to the native format of the device.

If you aren't familiar with the pipeline for ML deployment, this is the equivalent of comparing interpreted code to compiled code.

replies(7): >>45119755 #>>45120488 #>>45120646 #>>45121096 #>>45121128 #>>45121957 #>>45132362 #

1. nserrino ◴[03 Sep 25 21:15 UTC] No.45120488[source]▶

>>45119741 #

PyTorch is the baseline because that's what people prototype in, and the most common reference point. The aim here is to show that you can start from prototype code and automatically produce lower-level kernels (in this case Metal) that are more usable in real deployments, without additional work from the developer. Frontier models are capable at generating efficient Metal kernels automatically/immediately, and will only get better. We expect to see significant improvements as we refine the approach, but it's enough to show this seems to be a tractable problem for AI.

↑

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels