←back to thread

549 points thecr0w | 1 comments | | HN request time: 0.224s | source
Show context
999900000999 ◴[] No.46183645[source]
Space Jam website design as an LLM benchmark.

This article is a bit negative. Claude gets close , it just can't get the order right which is something OP can manually fix.

I prefer GitHub Copilot because it's cheaper and integrates with GitHub directly. I'll have times where it'll get it right, and times when I have to try 3 or 4 times.

replies(4): >>46183660 #>>46183768 #>>46184119 #>>46184297 #
GeoAtreides ◴[] No.46184119[source]
>which is something OP can manually fix

what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong? that's the main issue: if it fails here, it will fail with other things, in not such obvious ways.

replies(2): >>46185803 #>>46187184 #
alickz ◴[] No.46187184[source]
>what if the LLM gets something wrong that the operator (a junior dev perhaps) doesn't even know it's wrong?

the same thing that always happens if a dev gets something wrong without even knowing it's wrong - either code review/QA catches it, or the user does, and a ticket is created

>if it fails here, it will fail with other things, in not such obvious ways.

is infallibility a realistic expectation of a software tool or its operator?

replies(1): >>46189504 #
1. GeoAtreides ◴[] No.46189504[source]
By sheer chance, there's now a HN submission that answers both (but mostly the second) questions PERFECTLY:

https://news.ycombinator.com/item?id=46185957