So... It's a language model..? As in, not "large"? I'm a bit unsure of the magnitudes here, but surely "nano" and "large" cancel out
replies(1):
vllm is optimized to serve many requests at one time.
If you were to fine tune a model and wanted to serve it to many users, you would use vllm, not llama.cpp