(github.com)

186 points darkolorin | 1 comments | 15 Jul 25 11:29 UTC | HN request time: 0.207s | source

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.

Show context

dcreater ◴[15 Jul 25 16:05 UTC] No.44572592[source]▶

>>44570048 (OP) #

Somewhat faster on small models. Requires new format.

Not sure what the goal is for this project? Not seeing how this presents adequate benefits to get adopted by the community

replies(2): >>44573065 #>>44573715 #

1. worldsavior ◴[15 Jul 25 17:32 UTC] No.44573715[source]▶

>>44572592 #

It's utilizing Apple ANE and probably other optimization tools provided by Apple's framework. Not sure if llama.cpp uses them, but if they're not then the benchmark on GitHub says it all.

↑

Show HN: We made our own inference engine for Apple Silicon