(martinalderson.com)

507 points martinald | 2 comments | 28 Aug 25 10:15 UTC | HN request time: 0.434s | source

Show context

fallmonkey ◴[28 Aug 25 18:27 UTC] No.45055357[source]▶

The estimation for output token is too low since one reasoning-enabled response can burn through thousands of output tokens. Also low for input tokens since in actual use there're many context (memory, agents.md, rules, etc) included nowadays.

replies(1): >>45057769 #

1. atq2119 ◴[28 Aug 25 22:30 UTC] No.45057769[source]▶

>>45055357 #

When using APIs, you pay for reasoning tokens like you do for actual outputs. So, the estimation on a per-token basis is not affected by reasoning.

What reasoning affects is the ratio of input to output tokens, and since input tokens are cheaper, that may well affect the economics in the end.

replies(1): >>45060181 #

2. fallmonkey ◴[29 Aug 25 04:25 UTC] No.45060181[source]▶

>>45057769 (TP) #

Correct, and with reasoning, the ratio is totally off. As others have pointed out, actual usage is way higher (much more than 3-5x) than the estimation in the article, which is probably for very trivial users.

↑

Are OpenAI and Anthropic losing money on inference?