←back to thread

467 points mraniki | 3 comments | | HN request time: 0.627s | source
Show context
phkahler ◴[] No.43534852[source]
Here is a real coding problem that I might be willing to make a cash-prize contest for. We'd need to nail down some rules. I'd be shocked if any LLM can do this:

https://github.com/solvespace/solvespace/issues/1414

Make a GTK 4 version of Solvespace. We have a single C++ file for each platform - Windows, Mac, and Linux-GTK3. There is also a QT version on an unmerged branch for reference. The GTK3 file is under 2KLOC. You do not need to create a new version, just rewrite the GTK3 Linux version to GTK4. You may either ask it to port what's there or create the new one from scratch.

If you want to do this for free to prove how great the AI is, please document the entire session. Heck make a YouTube video of it. The final test is weather I accept the PR or not - and I WANT this ticket done.

I'm not going to hold my breath.

replies(15): >>43534866 #>>43534869 #>>43535026 #>>43535180 #>>43535208 #>>43535218 #>>43535261 #>>43535424 #>>43535811 #>>43535986 #>>43536115 #>>43536743 #>>43536797 #>>43536869 #>>43542998 #
snickell ◴[] No.43535424[source]
This is the smoothest tom sawyer move I've ever seen IRL, I wonder how many people are now grinding out your GTK4 port with our favorite LLM/system to see if it can. It'll be interesting to see if anyone gets something working with current-gen LLMs.

UPDATE: naive (just fed it your description verbatim) cline + claude 3.7 was a total wipeout. It looked like it was making progress, then freaked out, deleted 3/4 of its port, and never recovered.

replies(2): >>43535670 #>>43535712 #
phkahler ◴[] No.43535712[source]
>> This is the smoothest tom sawyer move I've ever seen IRL

That made me laugh. True, but not really the motivation. I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things. All the talk about putting programmers out of work has me calling BS but also thinking "show me". This task seems like a good combination of simple requirements, not much documentation, real world existing problem, non-trivial code size, limited scope.

replies(4): >>43536382 #>>43536563 #>>43536785 #>>43550128 #
cluckindan ◴[] No.43536563[source]
I agree. I tried something similar: a conversion of a simple PHP library from one system to another. It was only like 500 loc but Gemini 2.5 completely failed around line 300, and even then its output contained straight up hallucinations, half-brained additions, wrong namespaces for dependencies, badly indented code and other PSR style violations. Worse, it also changed working code and broke it.
replies(2): >>43537133 #>>43540295 #
stavros ◴[] No.43537133[source]
Try asking it to generate a high-level plan of how it's going to do the conversion first, then to generate function definitions for the new functions, then have it generate tests for the new functions, then actually write them, while giving it the output of the tests.

It's not like people just one-shot a whole module of code, why would LLMs?

replies(3): >>43537417 #>>43537455 #>>43538750 #
semi-extrinsic ◴[] No.43537417[source]
I know many people who can and will one-shot a rewrite of 500 LOC. In my world, 500 LOC is about the length of a single function. I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

And I don't think this is uncommon. Just a random example from Github, this file is 1800 LOC and 4 functions. It implements one very specific thing that's part of a broader library. (I have no affiliation with this code.)

https://github.com/elemental/Elemental/blob/master/src/optim...

replies(1): >>43537443 #
stavros ◴[] No.43537443[source]
> I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

You don't have to, you can write it by hand. I thought we were talking about how we can make computers write code, instead of humans, but it seems that we're trying to prove that LLMs aren't useful instead.

replies(2): >>43538757 #>>43540714 #
SpaceNoodled ◴[] No.43538757[source]
No, it's simply being demonstrated that they're not as useful as some claim.
replies(1): >>43538883 #
1. stavros ◴[] No.43538883[source]
By saying "why do I have to use a specific technique, instead of naively, to get what I want"?
replies(1): >>43539319 #
2. SpaceNoodled ◴[] No.43539319[source]
"Why do I have to put in more work to use this tool vs. not using it?"
replies(1): >>43540773 #
3. stavros ◴[] No.43540773[source]
Which is exactly what I said here:

https://news.ycombinator.com/item?id=43537443