←back to thread

186 points darkolorin | 1 comments | | HN request time: 0.32s | source

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel and platform.
Show context
nodesocket ◴[] No.44574806[source]
I just spun up a AWS EC2 g6.xlarge instance to do some llm work. The GPU is NVIDIA L4 24GB and costs $0.8048/per hour. Starting to think about switching to an Apple mac2-m2.metal instance for $0.878/ per hour. Big question is the Mac instance only has 24GB of unified memory.
replies(1): >>44576572 #
1. khurs ◴[] No.44576572[source]
Unified memory doesn't compare to a Nvidia GPU, the latter is much better.

Just depends on what performance level you need.