(github.com)

124 points simonpure | 4 comments | 23 Jun 25 05:10 UTC | HN request time: 1.134s | source

Show context

baalimago ◴[23 Jun 25 11:51 UTC] No.44354802[source]▶

So... It's a language model..? As in, not "large"? I'm a bit unsure of the magnitudes here, but surely "nano" and "large" cancel out

replies(1): >>44354838 #

1. IanCal ◴[23 Jun 25 11:55 UTC] No.44354838[source]▶

No, vLLM is a thing for serving language models: https://github.com/vllm-project/vllm

replies(1): >>44356954 #

2. barrenko ◴[23 Jun 25 15:47 UTC] No.44356954[source]▶

Is it more like llama.cpp then? I don't have access to the good hardware.

replies(1): >>44362462 #

3. jasonjmcghee ◴[24 Jun 25 02:49 UTC] No.44362462[source]▶

llama.cpp is optimized to serve one request at a time.

vllm is optimized to serve many requests at one time.

If you were to fine tune a model and wanted to serve it to many users, you would use vllm, not llama.cpp

replies(1): >>44366601 #

4. jasonjmcghee ◴[24 Jun 25 14:23 UTC] No.44366601{3}[source]▶

Here's a super relevant comment from another post https://news.ycombinator.com/item?id=44366418

Nano-Vllm: Lightweight vLLM implementation built from scratch