The End of Moore's Law for AI? Gemini Flash Offers a Warning

I think the big thing that really surprised me.

Llama 4 maverick is 16x 17b. So 67GB of size. The equivalency is 400billion.

Llama 4 behemoth is 128x 17b. 245gb size. The equivalency is 2 trillion.

I dont have the resources to be able to test these unfortunately; but they are claiming behemoth is superior to the best SAAS options via internal benchmarking.

Comparatively Deepseek r1 671B is 404gb in size; with pretty similar benchmarks.

But you compare deepseek r1 32b to any model from 2021 and it's going to be significantly superior.

So we have quality of models increasing, resources needed decreasing. In 5-10 years, do we have an LLM that loads up on a 16-32GB video card that is simply capable of doing it all?