Most active commenters
  • stavros(5)
  • SpaceNoodled(3)

←back to thread

467 points mraniki | 15 comments | | HN request time: 1.009s | source | bottom
Show context
phkahler ◴[] No.43534852[source]
Here is a real coding problem that I might be willing to make a cash-prize contest for. We'd need to nail down some rules. I'd be shocked if any LLM can do this:

https://github.com/solvespace/solvespace/issues/1414

Make a GTK 4 version of Solvespace. We have a single C++ file for each platform - Windows, Mac, and Linux-GTK3. There is also a QT version on an unmerged branch for reference. The GTK3 file is under 2KLOC. You do not need to create a new version, just rewrite the GTK3 Linux version to GTK4. You may either ask it to port what's there or create the new one from scratch.

If you want to do this for free to prove how great the AI is, please document the entire session. Heck make a YouTube video of it. The final test is weather I accept the PR or not - and I WANT this ticket done.

I'm not going to hold my breath.

replies(15): >>43534866 #>>43534869 #>>43535026 #>>43535180 #>>43535208 #>>43535218 #>>43535261 #>>43535424 #>>43535811 #>>43535986 #>>43536115 #>>43536743 #>>43536797 #>>43536869 #>>43542998 #
snickell ◴[] No.43535424[source]
This is the smoothest tom sawyer move I've ever seen IRL, I wonder how many people are now grinding out your GTK4 port with our favorite LLM/system to see if it can. It'll be interesting to see if anyone gets something working with current-gen LLMs.

UPDATE: naive (just fed it your description verbatim) cline + claude 3.7 was a total wipeout. It looked like it was making progress, then freaked out, deleted 3/4 of its port, and never recovered.

replies(2): >>43535670 #>>43535712 #
phkahler ◴[] No.43535712[source]
>> This is the smoothest tom sawyer move I've ever seen IRL

That made me laugh. True, but not really the motivation. I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things. All the talk about putting programmers out of work has me calling BS but also thinking "show me". This task seems like a good combination of simple requirements, not much documentation, real world existing problem, non-trivial code size, limited scope.

replies(4): >>43536382 #>>43536563 #>>43536785 #>>43550128 #
cluckindan ◴[] No.43536563[source]
I agree. I tried something similar: a conversion of a simple PHP library from one system to another. It was only like 500 loc but Gemini 2.5 completely failed around line 300, and even then its output contained straight up hallucinations, half-brained additions, wrong namespaces for dependencies, badly indented code and other PSR style violations. Worse, it also changed working code and broke it.
replies(2): >>43537133 #>>43540295 #
1. stavros ◴[] No.43537133[source]
Try asking it to generate a high-level plan of how it's going to do the conversion first, then to generate function definitions for the new functions, then have it generate tests for the new functions, then actually write them, while giving it the output of the tests.

It's not like people just one-shot a whole module of code, why would LLMs?

replies(3): >>43537417 #>>43537455 #>>43538750 #
2. semi-extrinsic ◴[] No.43537417[source]
I know many people who can and will one-shot a rewrite of 500 LOC. In my world, 500 LOC is about the length of a single function. I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

And I don't think this is uncommon. Just a random example from Github, this file is 1800 LOC and 4 functions. It implements one very specific thing that's part of a broader library. (I have no affiliation with this code.)

https://github.com/elemental/Elemental/blob/master/src/optim...

replies(1): >>43537443 #
3. stavros ◴[] No.43537443[source]
> I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

You don't have to, you can write it by hand. I thought we were talking about how we can make computers write code, instead of humans, but it seems that we're trying to prove that LLMs aren't useful instead.

replies(2): >>43538757 #>>43540714 #
4. chrismorgan ◴[] No.43537455[source]
> It's not like people just one-shot a whole module of code, why would LLMs?

For conversions between languages or libraries, you often do just one-shot it, writing or modifying code from start to end in order.

I remember 15 years ago taking a 10,000 line Java code base and porting it to JavaScript mostly like this, with only a few areas requiring a bit more involved and non-sequential editing.

replies(2): >>43543222 #>>43555350 #
5. SpaceNoodled ◴[] No.43538750[source]
Only 500 lines? That's miniscule.
6. SpaceNoodled ◴[] No.43538757{3}[source]
No, it's simply being demonstrated that they're not as useful as some claim.
replies(1): >>43538883 #
7. stavros ◴[] No.43538883{4}[source]
By saying "why do I have to use a specific technique, instead of naively, to get what I want"?
replies(1): >>43539319 #
8. SpaceNoodled ◴[] No.43539319{5}[source]
"Why do I have to put in more work to use this tool vs. not using it?"
replies(1): >>43540773 #
9. semi-extrinsic ◴[] No.43540714{3}[source]
If we have to break the problem into tiny pieces that can be individually tested in order for LLMs to be useful, I think it clearly limits LLM usability to a particular niche of programming.
replies(2): >>43540778 #>>43544707 #
10. stavros ◴[] No.43540773{6}[source]
Which is exactly what I said here:

https://news.ycombinator.com/item?id=43537443

11. stavros ◴[] No.43540778{4}[source]
You don't have to, the LLM will.
12. copperx ◴[] No.43543222[source]
So, you didn't test it until the end? or did you have to build it in such a way that is was partially testable?
replies(1): >>43553245 #
13. KronisLV ◴[] No.43544707{4}[source]
> If we have to break the problem into tiny pieces that can be individually tested

Isn't this something that we should have doing for decades of our own volition?

Separation of concerns, single responsibility principle, all of that talk and trend of TDD or at the very least having good test coverage, or writing code that at least can be debugged without going insane (no Heisenbugs, maybe some intermediate variables to stop on in a debugger, instead of just endless chained streams, though opinions are split, at least code that is readable and not 3 pages worth per function).

Because when I see long bits of code that I have to change without breaking anything surrounding them, I don't feel confident in doing that even if it's a codebase I'm familiar with, much less trust an AI on it (at that point it might be a "Hail Mary", a last ditch effort in hoping that at least the AI can find method in the madness before I have to get my own hands dirty and make my hair more gray).

14. chrismorgan ◴[] No.43553245{3}[source]
One of the nifty things about the target being JavaScript was that I didn’t have to finish it before I could run it—it was the sort of big library where typical code wouldn’t use most of the functionality. It was audio stuff, so there were a couple of core files that needed more careful porting (from whatever in Java to Mozilla’s Audio Data API, which was a fairly good match), and then the rest was fairly routine that could be done gradually, as I needed them or just when I didn’t have anything better to focus on. Honestly, one of the biggest problems was forgetting to prefix instance properties with `this.`
15. dietr1ch ◴[] No.43555350[source]
I think this shows how the approach LLMs take is wrong. For us it's easy because we simply sort of iterate over every function with a simple prompt of doing a translation, but are yet careful enough taking notes of whatever may be relevant to do a higher level change if necessary.

Maybe the mistake is mistaking LLMs as capable people instead of a simple, but optimised neuron soup tuned for text.