Building a Personal AI Factory

(www.john-rush.com)

260 points derek | 3 comments | 01 Jul 25 21:14 UTC | HN request time: 2.186s | source

Show context

simonw ◴[02 Jul 25 00:02 UTC] No.44439075[source]▶

My hunch is that this article is going to be almost completely impenetrable to people who haven't yet had the "aha" moment with Claude Code.

That's the moment when you let "claude --dangerously-skip-permissions" go to work on a difficult problem and watch it crunch away by itself for a couple of minutes running a bewildering array of tools until the problem is fixed.

I had it compile, run and debug a Mandelbrot fractal generator in 486 assembly today, executing in Docker on my Mac, just to see how well it could do. It did great! https://gist.github.com/simonw/ba1e9fa26fc8af08934d7bc0805b9...

replies(7): >>44439177 #>>44439259 #>>44439544 #>>44440242 #>>44441017 #>>44441069 #>>44441796 #

low_common ◴[02 Jul 25 01:41 UTC] No.44439544[source]▶

>>44439075 #

That's a pretty trivial example for one of these IDEs to knock out. Assembly is certainly in their training sets, and obviously docker is too. I've watched cursor absolutely run amok when I let it play around in some of my codebase.

I'm bullish it'll get there sooner rather than later, but we're not there yet.

replies(2): >>44439886 #>>44441960 #

simonw ◴[02 Jul 25 02:59 UTC] No.44439886[source]▶

>>44439544 #

I think the hardest problem in computer science right now may be coming up with an LLM demo that doesn't get called "pretty trivial".

replies(14): >>44439918 #>>44440031 #>>44441154 #>>44441225 #>>44441323 #>>44441441 #>>44441638 #>>44441811 #>>44442389 #>>44442493 #>>44443084 #>>44444778 #>>44446970 #>>44457389 #

pydry ◴[02 Jul 25 12:45 UTC] No.44443084[source]▶

>>44439886 #

Really? This paper cut through the same kind of bullshit with puzzles: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

What do you think is so difficult about doing the same thing with coding problems?

replies(1): >>44443442 #

simonw ◴[02 Jul 25 13:26 UTC] No.44443442[source]▶

>>44443084 #

I don't understand the connection between that paper and my comment.

replies(1): >>44443831 #

1. pydry ◴[02 Jul 25 13:59 UTC] No.44443831[source]▶

>>44443442 #

They created an environment to expose LLMs to problems and test their performance which were immune from benchmark hacking using puzzles.

Your comment was about how this was unreasonably hard (for coding challenges).

Anecdotally Ive seen LLMs do all sorts of amazing shit which was obviously drawn from their training set and fall flat on their faces doing simple coding tasks which are novel enough to not appear in the training set.

replies(1): >>44444237 #

2. simonw ◴[02 Jul 25 14:33 UTC] No.44444237[source]▶

>>44443831 (TP) #

That Apple paper mainly demonstrated that "reasoning" LLMs - with no access to additional tools - can't solve problems that deliberately exceed their token context length.

I don't think it has much relevance at all to a conversational about how good LLMs are at solving programming problems by running tools in a loop.

I keep seeing this idea that LLMs can't handle problems that aren't in their training data and it's frustrating because anyone who has spent significant time working with these systems knows that it obviously isn't true.

replies(1): >>44452688 #

3. pydry ◴[03 Jul 25 07:57 UTC] No.44452688[source]▶

>>44444237 #

It demonstrated that there was a hard limit on the complexity of a puzzle that LLMs could solve no matter how many tokens they threw at it (using a form of puzzle construction that it ensured that the LLM couldn't just refer to its training data to solve it).

↑