Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison

I guess depends on the task? I have very low expectations for Gemini, but I gave it a run with a signal processing easy problem and it did well. It took 30 seconds to reason through a problem that would have taken me between 5 to 10 minutes to reason. Gemini's reasoning was sound (but it took me a couple of minutes to decide that), and it also wrote the functions with the changes (which took me an extra minute to verify). It's not a definitive win in time, but at least there was an extra pair of "eyes"--or whatever that's called with a system like this one.

All in all, I think we humans are well on our way to become legal flesh[].

[] The part of the system to whip or throw in jail when a human+LLM commit a mistake.

>I guess depends on the task? I have very low expectations for Gemini, but I gave it a run with a signal processing easy problem and it did well. It took 30 seconds to reason through a problem that would have taken me between 5 to 10 minutes to reason. Gemini's reasoning was sound (but it took me a couple of minutes to decide that), and it also wrote the functions with the changes (which took me an extra minute to verify). It's not a definitive win in time, but at least there was an extra pair of "eyes"--or whatever that's called with a system like this one.

I wonder if you treat code from a Jr engineer the same way? Seems impossible to scale a team that way. You shouldnt need to verify every line but rather have test harnesses that ensure adherence to the spec.