I failed to recreate the 1996 Space Jam website with Claude

Show context

thuttinger ◴[07 Dec 25 19:44 UTC] No.46184466[source]▶

Claude/LLMs in general are still pretty bad at the intricate details of layouts and visual things. There are a lot of problems that are easy to get right for a junior web dev but impossible for an LLM. On the other hand, I was able to write a C program that added gamma color profile support to linux compositors that don't support it (in my case Hyprland) within a few minutes! A - for me - seemingly hard task, which would have taken me at least a day or more if I didn't let Claude write the code. With one prompt Claude generated C code that compiled on first try that:

- Read an .icc file from disk

- parsed the file and extracted the VCGT (video card gamma table)

- wrote the VCGT to the video card for a specified display via amdgpu driver APIs

The only thing I had to fix was the ICC parsing, where it would parse header strings in the wrong byte-order (they are big-endian).

replies(3): >>46184840 #>>46185379 #>>46185476 #

jacquesm ◴[07 Dec 25 21:33 UTC] No.46185379[source]▶

>>46184466 #

Claude didn't write that code. Someone else did and Claude took that code without credit to the original author(s), adapted it to your use case and then presented it as its own creation to you and you accepted this. If a human did this we probably would have a word for them.

replies(16): >>46185404 #>>46185408 #>>46185442 #>>46185473 #>>46185478 #>>46185791 #>>46185885 #>>46185911 #>>46186086 #>>46186326 #>>46186420 #>>46186759 #>>46187004 #>>46187058 #>>46187235 #>>46188771 #

FanaHOVA ◴[07 Dec 25 21:43 UTC] No.46185473[source]▶

>>46185379 #

Are you saying that every piece of code you have ever written contains a full source list of every piece of code you previously read to learn specific languages, patterns, etc?

Or are you saying that every piece of code you ever wrote was 100% original and not adapted from any previous codebase you ever worked in or any book / reference you ever read?

replies(2): >>46185606 #>>46188678 #

jacquesm ◴[07 Dec 25 21:57 UTC] No.46185606[source]▶

>>46185473 #

What's with the bad takes in this thread. That's two strawmen in one comment, it's getting a bit crowded.

replies(1): >>46185838 #

DangitBobby ◴[07 Dec 25 22:20 UTC] No.46185838[source]▶

>>46185606 #

Or the original point doesn't actually hold up to basic scrutiny and is indistinguishable from straw itself.

replies(2): >>46186158 #>>46189150 #

tovej ◴[08 Dec 25 06:42 UTC] No.46189150[source]▶

>>46185838 #

The original point, that LLMs are plagiarising inputs, is a very common and common sense opinion.

There are court cases where this is being addressed currently, and if you think about how LLMs operate, a reasonable person typically sees that it looks an awful lot like plagiarism.

If you want to claim it is not plagiarism, that requires a good argument, because it is unclear that LLMs can produce novelty, since they're literally trying to recreate the input data as faithfully as possible.

replies(1): >>46192004 #

DangitBobby ◴[08 Dec 25 13:30 UTC] No.46192004[source]▶

>>46189150 #

I need you to prove to me that it's not plagiarism when you write code that uses a library after reading documentation, I guess.

> since they're literally trying to recreate the input data as faithfully as possible.

Is that how they are able to produce unique code based on libraries that didn't exist in their training set? Or that they themselves wrote? Is that how you can give them the documentation for an API and it writes code that uses it? Your desire to make LLMs "not special" has made you completely blind to reality. Come back to us.

replies(2): >>46192252 #>>46193014 #

tovej ◴[08 Dec 25 13:55 UTC] No.46192252[source]▶

>>46192004 #

What?

The LLM is trained on a corpus of text, and when it is given a sequence of tokens, it finds a set of token that, when one of them is appended, make the resulting sequence most like the text in that corpus.

If it is given a sequence of tokens that is unlike anything in its corpus, all bets are off and it produces garbage, just like machine learning models in general: if the input is outside the learned distribution, quality goes downhill fast.

The fact that they've added a Monte Carlo feature to the sequence generation, which makes it sometimes select a token that is slightly less like the most exact match in the corpus does not change this.

LLMs are fuzzy lookup tables for existing text, that hallucinate text for out-of-distribution queries.

This is LLM 101.

If the LLM was only trained using documentation, then there would be no problem. If it would generate a design, look at the documentation, understand the semantics of both, and translate the design to code by using the documentation as a guide.

But that's not how it works. It has open source repositories in its corpus that it then recreates by chaining together examples in this stochastic parrot -method I described above.

replies(1): >>46197240 #