I founded Tufalabs about a year ago https://tufalabs.ai/
On the side, I'm making a snake game with codex-cli, and then making a ppo agent to learn to play it. Codex-cli can mostly oneshot the ppo implementation, which I find extremely impressive. Getting codex to tweak the ppo algorithm resulting in a RL agent fully solving the snake and visualizing it is very satisfying.