←back to thread

435 points crawshaw | 10 comments | | HN request time: 1.172s | source | bottom
Show context
libraryofbabel ◴[] No.43999072[source]
Strongly recommend this blog post too which is a much more detailed and persuasive version of the same point. The author actually goes and builds a coding agent from zero: https://ampcode.com/how-to-build-an-agent

It is indeed astonishing how well a loop with an LLM that can call tools works for all kinds of tasks now. Yes, sometimes they go off the rails, there is the problem of getting that last 10% of reliability, etc. etc., but if you're not at least a little bit amazed then I urge you go to and hack together something like this yourself, which will take you about 30 minutes. It's possible to have a sense of wonder about these things without giving up your healthy skepticism of whether AI is actually going to be effective for this or that use case.

This "unreasonable effectiveness" of putting the LLM in a loop also accounts for the enormous proliferation of coding agents out there now: Claude Code, Windsurf, Cursor, Cline, Copilot, Aider, Codex... and a ton of also-rans; as one HN poster put it the other day, it seems like everyone and their mother is writing one. The reason is that there is no secret sauce and 95% of the magic is in the LLM itself and how it's been fine-tuned to do tool calls. One of the lead developers of Claude Code candidly admits this in a recent interview.[0] Of course, a ton of work goes into making these tools work well, but ultimately they all have the same simple core.

[0] https://www.youtube.com/watch?v=zDmW5hJPsvQ

replies(12): >>43999361 #>>43999593 #>>44000028 #>>44000133 #>>44000238 #>>44000739 #>>44002234 #>>44003725 #>>44003808 #>>44004127 #>>44005134 #>>44010227 #
datpuz ◴[] No.44000739[source]
Can't think of anything an LLM is good enough at to let them do on their own in a loop for more than a few iterations before I need to reign it back in.
replies(8): >>44000859 #>>44000866 #>>44001035 #>>44001519 #>>44002014 #>>44002521 #>>44003823 #>>44005529 #
Groxx ◴[] No.44001035[source]
They're extremely good at burning through budgets, and get even better when unattended
replies(2): >>44001444 #>>44001496 #
1. mycall ◴[] No.44001496[source]
Is that really true? I though there free models and $200 all you can eat models.
replies(2): >>44001556 #>>44002444 #
2. nsomaru ◴[] No.44001556[source]
These tools require API calls which usually aren’t priced like the consumer plans
replies(3): >>44001662 #>>44001744 #>>44008268 #
3. adastra22 ◴[] No.44001662[source]
Yeah they’re cheaper. I’ve written whole apps for $0.20 in API calls.
replies(2): >>44005170 #>>44007683 #
4. jfim ◴[] No.44001744[source]
Claude code is now part of the consumer $100/mo max plan.
replies(1): >>44002322 #
5. Aeolun ◴[] No.44002322{3}[source]
If they give me API access too I’m sold xD
6. piuantiderp ◴[] No.44002444[source]
Read that you can very quickly blow the budget on the 200/mo ones too
7. monsieurbanana ◴[] No.44005170{3}[source]
With which agent? What kind of apps?

Without more information I'm very skeptical that you had e.g. Claude Code create a whole app (so more than a simple script) with 20 cents. Unless it was able to one-shot it, but at that point you don't need an agent anyway.

replies(1): >>44007230 #
8. adastra22 ◴[] No.44007230{4}[source]
Aider, Claude 3.7.
9. datpuz ◴[] No.44007683{3}[source]
I've "written" whole apps by going to GitHub, cloning a repo, right clicking, and renaming it to "MyApp." Impressed?
10. never_inline ◴[] No.44008268[source]
Well technically Aider let's you use a web chat UI by generating some context and letting you paste back and forth.