The bitter lesson is about the fact that general methods that leverage computation are ultimately the most effective. Grok 3 is not more general than DeepSeek or OpenAI models so mentioning the bitter lesson here doesn't make much sense, it's just the scaling law.