Only introducing this *NVIDIA AI Released Jet-Nemotron: 53x Faster Hybrid-Architecture Language Model Series that Translates to a 98% Cost Reduction for Inference at Scale, into the conversation as it just dropped (so it is timely) and while it seems unlikely either OpenAI or Anthropic use this or a technique like it (yet or if they even can), these types of breakthroughs may introduce dramatic savings for both closed and open source inference at scale moving forward https://www.marktechpost.com/2025/08/26/nvidia-ai-released-j...