The author mentioned AlphaGo and Alpha Zero without mentioning OpenAI gym and OpenAI Five.
Those products show OpenAI was innovating and leading in RL at that stage around 2017 to 2019.
replies(1):
Those products show OpenAI was innovating and leading in RL at that stage around 2017 to 2019.
DeepSeek's GRPO is also just a minor variant of PPO.