←back to thread

186 points darkolorin | 1 comments | | HN request time: 0.998s | source

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.
Show context
homarp ◴[] No.44572261[source]
Can you explain the type of quantization you support?

would https://docs.unsloth.ai/basics/kimi-k2-how-to-run-locally be faster with mirai?

replies(1): >>44572552 #
1. AlekseiSavin ◴[] No.44572552[source]
right now, we support AWQ but are currently working on various quantization methods in https://github.com/trymirai/lalamo