Anyone have a take on how the coding performance (quality and speed) of the 2.0 Pro Experimental compares to o3-mini-high?
The 2 million token window sure feels exciting.
replies(2):
With Copilot Pro and DeepSeek's website, I ran "find logic bugs" on a 1200 LOC file I actually needed code review for:
- DeepSeek R1 found like 7 real bugs out of 10 suggested with the remaining 3 being acceptable false positives due to missing context
- Claude was about the same with fewer remaining bugs; no hallucinations either
- Meanwhile, Gemini had 100% false positive rate, with many hallucinations and unhelpful answers to the prompt
I understand Gemini 2.0 is not a reasoning model, but DeepClaude remains the most effective LLM combo so far.