←back to thread

577 points simonw | 1 comments | | HN request time: 0s | source
Show context
AlexeyBrin ◴[] No.44723521[source]
Most likely its training data included countless Space Invaders in various programming languages.
replies(6): >>44723664 #>>44723707 #>>44723945 #>>44724116 #>>44724439 #>>44724690 #
NitpickLawyer ◴[] No.44723707[source]
This comment is ~3 years late. Every model since gpt3 has had the entirety of available code in their training data. That's not a gotcha anymore.

We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.

These kinds of comments are really missing the point.

replies(7): >>44723808 #>>44723897 #>>44724175 #>>44724204 #>>44724397 #>>44724433 #>>44729201 #
1. stolencode ◴[] No.44729201[source]
It's amazing that none of you even try to falsify you claims anymore. You can literally just put some of the code in a search engine and find the prior art example:

https://www.web-leb.com/en/code/2108

Your "AI tools" are just "copyright whitewashing machines."

These kinds of comments are really ignoring reality.