←back to thread

169 points hunvreus | 1 comments | | HN request time: 0.199s | source
Show context
NitpickLawyer ◴[] No.43653452[source]
Cool article! The stack (and results) are impressive, but I also appreciate the article in itself, starting from basics and getting to the point in a clear and slowly expanding way. Easy to follow and appreciate.

On a bit of a tangent rant, this kind of writing is slowly going away, taken over by LLM slop (and I'm a huge fan of LLMs, just not the people who write those kinds of articles). I was recently looking for real world benchmarks for vllm/sglang deployments of DeepSeek3 on a 8x 96GB pod, to see if the model fits into the amount of RAM, with kv cache and context length, what numbers to people get, etc.

Of the ~20 articles that google surfaced on various attempts of keywords, none were what I was looking for. The excerpts seemed promising, some even offered tables & stuff related to ds3 and RAM usage, but all were LLM crap. All were written in that simple style - intro - bla bla - conclusion, some even had RAM requirements that made no sense (running a model trained in FP8 in 16bit, something noone would do, etc.)

replies(2): >>43653624 #>>43654145 #
1. thomasjudge ◴[] No.43654145[source]
You are describing this as a writing problem, but it sounds more like a search results/search engine problem