←back to thread

483 points mraniki | 1 comments | | HN request time: 0s | source
Show context
phkahler ◴[] No.43534852[source]
Here is a real coding problem that I might be willing to make a cash-prize contest for. We'd need to nail down some rules. I'd be shocked if any LLM can do this:

https://github.com/solvespace/solvespace/issues/1414

Make a GTK 4 version of Solvespace. We have a single C++ file for each platform - Windows, Mac, and Linux-GTK3. There is also a QT version on an unmerged branch for reference. The GTK3 file is under 2KLOC. You do not need to create a new version, just rewrite the GTK3 Linux version to GTK4. You may either ask it to port what's there or create the new one from scratch.

If you want to do this for free to prove how great the AI is, please document the entire session. Heck make a YouTube video of it. The final test is weather I accept the PR or not - and I WANT this ticket done.

I'm not going to hold my breath.

replies(15): >>43534866 #>>43534869 #>>43535026 #>>43535180 #>>43535208 #>>43535218 #>>43535261 #>>43535424 #>>43535811 #>>43535986 #>>43536115 #>>43536743 #>>43536797 #>>43536869 #>>43542998 #
snickell ◴[] No.43535424[source]
This is the smoothest tom sawyer move I've ever seen IRL, I wonder how many people are now grinding out your GTK4 port with our favorite LLM/system to see if it can. It'll be interesting to see if anyone gets something working with current-gen LLMs.

UPDATE: naive (just fed it your description verbatim) cline + claude 3.7 was a total wipeout. It looked like it was making progress, then freaked out, deleted 3/4 of its port, and never recovered.

replies(2): >>43535670 #>>43535712 #
phkahler ◴[] No.43535712[source]
>> This is the smoothest tom sawyer move I've ever seen IRL

That made me laugh. True, but not really the motivation. I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things. All the talk about putting programmers out of work has me calling BS but also thinking "show me". This task seems like a good combination of simple requirements, not much documentation, real world existing problem, non-trivial code size, limited scope.

replies(5): >>43536382 #>>43536563 #>>43536785 #>>43550128 #>>43567838 #
1. nico ◴[] No.43536785{3}[source]
> I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things

In my experience it seems like it depends on what they’ve been trained on

They can do some pretty amazing stuff in python, but fail even at the most basic things in arm64 assembly

These models have probably not seen a lot of GTK3/4 code and maybe not even a single example of porting between the two versions

I wonder if finetuning could help with that