Super helpful to see actual examples of what it (roughly) can look like to deploy production inference workloads, and also the latest optimization efforts.
I consult in this space and clients still don't fully understand how complex it can get to just "run your own LLM".