(j0nah.com)

549 points thecr0w | 1 comments | 07 Dec 25 17:18 UTC | HN request time: 0.224s | source

Show context

999900000999 ◴[07 Dec 25 18:05 UTC] No.46183645[source]▶

Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

replies(4): >>46183660 #>>46183768 #>>46184119 #>>46184297 #

GeoAtreides ◴[07 Dec 25 19:02 UTC] No.46184119[source]▶

>>46183645 #

>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

replies(2): >>46185803 #>>46187184 #

alickz ◴[08 Dec 25 01:10 UTC] No.46187184[source]▶

>>46184119 #

>what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong?

the same thing that always happens if a dev gets something wrong without even knowing it's wrong - either code review/QA catches it, or the user does, and a ticket is created

>if it fails here, it will fail with other things, in not such obvious ways.

is infallibility a realistic expectation of a software tool or its operator?

replies(1): >>46189504 #

1. GeoAtreides ◴[08 Dec 25 07:40 UTC] No.46189504[source]▶

>>46187184 #

By sheer chance, there's now a HN submission that answers both (but mostly the second) questions PERFECTLY:

https://news.ycombinator.com/item?id=46185957

↑

I failed to recreate the 1996 Space Jam website with Claude