Yes many of us are surprised at negativity at Grok.
Grok is a top contender for me.
I also use 5 LLMs in parallel everyday, but my default stack is Grok, DeepSeek, Gemini 2.5 pro, ChatGPT, Claude - same as OP but I most often switch out Perplexity for Gemini. (DeepSeek with search has become my perplexity replacement usually)
Most of my questions don't hit topics prone to trigger safety blocks, in this case I find gemini surprisingly strong, but for difficult things Grok often wins.
Gemini and Grok and Claude benefit a lot whenever they supplement their knowledge with on demand searches rather than just quick reasoning. Ask a deep insight question on Gemini Pro without making it research and you will discover the hallucinations, logical conclusions that contradict actual known facts etc. Same with Grok. Claude Code CLI when going in circles, remind it to google for more information to break it out.
Grok one shotted a replacement algorithm of several hundred lines of code to replace a part of an operational transform library that had a bug for the last 5 revisions. It passed all my tests. Base grok 4 Model wasn't even optimised for code at that time. Color me impressed!