Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison

(composio.dev)

483 points mraniki | 1 comments | 31 Mar 25 12:09 UTC | HN request time: 0.213s | source

Show context

phkahler ◴[31 Mar 25 13:30 UTC] No.43534852[source]▶

Here is a real coding problem that I might be willing to make a cash-prize contest for. We'd need to nail down some rules. I'd be shocked if any LLM can do this:

https://github.com/solvespace/solvespace/issues/1414

Make a GTK 4 version of Solvespace. We have a single C++ file for each platform - Windows, Mac, and Linux-GTK3. There is also a QT version on an unmerged branch for reference. The GTK3 file is under 2KLOC. You do not need to create a new version, just rewrite the GTK3 Linux version to GTK4. You may either ask it to port what's there or create the new one from scratch.

If you want to do this for free to prove how great the AI is, please document the entire session. Heck make a YouTube video of it. The final test is weather I accept the PR or not - and I WANT this ticket done.

I'm not going to hold my breath.

replies(15): >>43534866 #>>43534869 #>>43535026 #>>43535180 #>>43535208 #>>43535218 #>>43535261 #>>43535424 #>>43535811 #>>43535986 #>>43536115 #>>43536743 #>>43536797 #>>43536869 #>>43542998 #

jchw ◴[31 Mar 25 16:19 UTC] No.43536743[source]▶

>>43534852 #

I suspect it probably won't work, although it's not necessarily because an LLM architecture could never perform this type of work, but rather because it works best when the training set contains inordinate sample data. I'm actually quite shocked at what they can do in TypeScript and JavaScript, but they're definitely a bit less "sharp" when it comes to stuff outside of that zone in my experience.

The ridiculous amount of data required to get here hints that there is something wrong in my opinion.

I'm not sure if we're totally on the same page, but I understand where you're coming from here. Everyone keeps talking about how transformational these models are, but when push comes to shove, the cynicism isn't out of fear or panic, its disappointment over and over and over. Like, if we had an army of virtual programmers fixing serious problems for open source projects, I'd be more excited about the possibilities than worried about the fact that I just lost my job. Honest to God. But the thing is, if that really were happening, we'd see it. And it wouldn't have to be forced and exaggerated all the time, it would be plainly obvious, like the way AI art has absolutely flooded the Internet... except I don't give a damn if code is soulless as long as it's good, so it would possibly be more welcome. (The only issue is that it most likely actually suck when that happens, and rather just be functional enough to get away with, but I like to try to be optimistic once in a while.)

You really make me want to try this, though. Imagine if it worked!

Someone will probably beat me to it if it can be done, though.

replies(5): >>43537512 #>>43538902 #>>43539761 #>>43541786 #>>43552468 #

skydhash ◴[31 Mar 25 17:29 UTC] No.43537512[source]▶

>>43536743 #

> the cynicism isn't out of fear or panic, its disappointment over and over and over

Very much this. When you criticize LLM's marketing, people will say you're a ludite.

I'd bet that no one actually likes to write code, as in typing into an editor. We know how to do it, and it's easy enough to enter in a flow state while doing it. But everyone is trying to write less code by themselves with the proliferation of reusable code, libraries, framework, code generators, metaprogramming,...

I'd be glad if I could have a DAW or CAD like interface with very short feedback (the closest is live programming with Smalltalk). So that I don't have to keep visualizing the whole project (it's mentally taxing).

replies(3): >>43538806 #>>43539637 #>>43542982 #

1. e3bc54b2 ◴[01 Apr 25 04:55 UTC] No.43542982[source]▶

>>43537512 #

> no one actually likes to write code

between this and..

> But everyone is trying to write less code by themselves with the proliferation of reusable code, libraries, framework, code generators, metaprogramming

.. this, is a massive gap. Personally speaking, I hate writing boilerplate code, y'know, old school Java with design patterns getter/setter, redundant multi-layer catch blocks, stateful for loops etc. That gets on my nerves, because it increases my work for little benefits. Cue modern coding practices and I'm almost exclusively thinking how to design solution to the problem at hand, and almost all the code is business logic exclusive.

This is where a lot of LLMs just fail. Handholding them all the way to correct solution feels like writing boilerplate again, except worse because I don't know when I'll be done. It doesn't help that most code available for LLMs is JS/TS/Java where boilerplate galore, but somehow I doubt giving them exclusively good codebases will help.

↑