The linked whitepaper is pretty useless, and I am saying as a big fan of diffusion-transformers-for-not-just-images-or-videos approach.
Also, Gemini Diffusion ([1]) is way better at coding than Mercury offering.
1. https://deepmind.google/models/gemini-diffusion/