←back to thread

549 points thecr0w | 2 comments | | HN request time: 0.474s | source
Show context
999900000999 ◴[] No.46183645[source]
Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

replies(4): >>46183660 #>>46183768 #>>46184119 #>>46184297 #
1. smallnix ◴[] No.46183768[source]
That's not the point of the article. It's about Claude/LLM being overconfident in recreating pixel perfect.
replies(1): >>46185413 #
2. jacquesm ◴[] No.46185413[source]
All AI's are overconfident. It's impressive what they can do, but it is at the same time extremely unimpressive what they can't do while passing it off as the best thing since sliced bread. 'Perfect! Now I see the problem.'. 'Thank you for correcting that, here is a perfect recreation of problem 'x' that will work with your hardware.' (never mind the 10 glaring mistakes).

I've tried these tools a number of times and spent a good bit of effort on learning to maximize the return. By the time you know what prompt to write you've solved the problem yourself.