Some agent startups are already feeling the squeeze — The Information reported Cursor’s gross margins hit –16% due to token costs. So even if inference is profitable for OAI/Anthropic, downstream token-hungry apps may not see the same unit economics, and that is why token-intensive agent startups like Cursor and Perplexity are taking open-source models like Qwen or other OSS-120B and post-training them to bring down inference costs.