←back to thread

281 points GabrielBianconi | 1 comments | | HN request time: 0.212s | source
1. cootsnuck ◴[] No.45067901[source]
Super helpful to see actual examples of what it (roughly) can look like to deploy production inference workloads, and also the latest optimization efforts.

I consult in this space and clients still don't fully understand how complex it can get to just "run your own LLM".