* The $5M DeepSeek-R1 (and now this cheap $6 R1) are both based on very expensive oracles (if we believe DeepSeek-R1 queried OpenAI's model). If these are improvements on existing models, why is this being reported as decimating training costs? Isn't fine-tuning already a cheap way to optimize? (maybe not as effective, but still)
* The R1 paper talks about improving one simple game - Countdown. But the original models are "magic" because they can solve a nearly uncountable number of problems and scenarios. How does the DeepSeek / R1 approach scale to the same gigantic scale?
* Phrased another way, my understanding is that these techniques are using existing models as black-box oracles. If so, how many millions/billions/trillions of queries must be probed to replicate and improve the original dataset?
* Is anything known about the training datasets used by DeepSeek? OpenAI used presumably every scraped dataset they could get their hands on. Did DS do the same?