←back to thread

507 points martinald | 1 comments | | HN request time: 0.299s | source
1. smcleod ◴[] No.45058408[source]
A few things:

1. Your token count per day seems quite low ("2M input tokens, ~30k output tokens/day") - that's FAR less than I'd expect,, for comparison I average 330M - 850M combined tokens per day, I'm on the higher side of my peers that average 150M-600M combined tokens per day.

2. It doesn't seem you're taking prompt caching into account. This generally reduces the inference required for agentic coding by 85-95%.

3. It would be good if you added what quantisation you're running, for example 8.5-9bpw / (Q8 equivalent) (indistinguishable from fp32/bf16) for the model, and for the KV cache (Q8/(b)f16 etc..).