/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Show HN: We made our own inference engine for Apple Silicon
(github.com)
186 points
darkolorin
| 1 comments |
15 Jul 25 11:29 UTC
|
HN request time: 0.998s
|
source
We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.
Show context
homarp
◴[
15 Jul 25 15:36 UTC
]
No.
44572261
[source]
▶
>>44570048 (OP)
#
Can you explain the type of quantization you support?
would
https://docs.unsloth.ai/basics/kimi-k2-how-to-run-locally
be faster with mirai?
replies(1):
>>44572552
#
1.
AlekseiSavin
◴[
15 Jul 25 16:01 UTC
]
No.
44572552
[source]
▶
>>44572261
#
right now, we support AWQ but are currently working on various quantization methods in
https://github.com/trymirai/lalamo
ID:
GO
↑