(github.com)

186 points darkolorin | 1 comments | 15 Jul 25 11:29 UTC | HN request time: 0s | source

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.

Show context

rnxrx ◴[15 Jul 25 15:22 UTC] No.44572115[source]▶

>>44570048 (OP) #

I'm curious about why the performance gains mentioned were so substantial for Qwen vs Llama?

replies(1): >>44572574 #

1. AlekseiSavin ◴[15 Jul 25 16:03 UTC] No.44572574[source]▶

>>44572115 #

it looks like llama.cpp has some performance issues with bf16

↑

Show HN: We made our own inference engine for Apple Silicon