A staff engineer's journey with Claude Code

(www.sanity.io)

548 points kmelve | 1 comments | 02 Sep 25 19:34 UTC | HN request time: 0s | source

Show context

swframe2 ◴[02 Sep 25 20:56 UTC] No.45108930[source]▶

Preventing garbage just requires that you take into account the cognitive limits of the agent. For example ...

1) Don't ask for large / complex change. Ask for a plan but ask it to implement the plan in small steps and ask the model to test each step before starting the next.

2) For really complex steps, ask the model to write code to visualize the problem and solution.

3) If the model fails on a given step, ask it to add logging to the code, save the logs, run the tests and the review the logs to determine what went wrong. Do this repeatedly until the step works well.

4) Ask the model to look at your existing code and determine how it was designed to implement a task. Some times the model will put all of the changes in one file but your code has a cleaner design the model doesn't take into account.

I've seen other people blog about their tricks and tips. I do still see garbage results but not as high as 95%.

replies(20): >>45109085 #>>45109229 #>>45109255 #>>45109297 #>>45109350 #>>45109631 #>>45109684 #>>45109710 #>>45109743 #>>45109822 #>>45109969 #>>45110014 #>>45110639 #>>45110707 #>>45110868 #>>45111654 #>>45112029 #>>45112178 #>>45112219 #>>45112752 #

nostrademons ◴[02 Sep 25 22:16 UTC] No.45109822[source]▶

>>45108930 #

I've found that an effective tactic for larger, more complex tasks is to tell it "Don't write any code now. I'm going to describe each of the steps of the problem in more detail. The rough outline is going to be 1) Read this input 2) Generate these candidates 3) apply heuristics to score candidates 4) prioritize and rank candidates 5) come up with this data structure reflecting the output 6) write the output back to the DB in this schema". Claude will then go and write a TODO list in the code (and possibly claude.md if you've run /init), and prompt you for the details of each stage. I've even done this for an hour, told Claude "I have to stop now. Generate code for the finished stages and write out comments so you can pick up where you left off next time" and then been able to pick up next time with minimal fuss.

replies(2): >>45109910 #>>45110652 #

yahoozoo ◴[02 Sep 25 23:55 UTC] No.45110652[source]▶

>>45109822 #

How does a token predictor “apply heuristics to score candidates”? Is it running a tool, such as a Python script it writes for scoring candidates? If not, isn’t it just pulling some statistically-likely “score” out of its weights rather than actually calculating one?

replies(2): >>45111541 #>>45115619 #

imtringued ◴[03 Sep 25 13:35 UTC] No.45115619[source]▶

>>45110652 #

You can think of the K(=key) matrix in attention as a neural network where each token is turned into a tiny classifier network with multiple inputs and a single output.

The softmax activation function picks the most promising activations for a given output token.

The V(=value) matrix forms another neural network where each token is turned into a tiny regressor neural network that accepts the activation as an input and produces multiple outputs that are summed up to produce an intermediate token which is then fed into the MLP layer.

From this perspective the transformer architecture is building neural networks at runtime.

But there are some pretty obvious limitations here: The LLM operates on tokens, which means it can only operate on what is in the KV-cache/context window. If the candidates are not in the context window, it can't score them.

replies(1): >>45120674 #

1. yahoozoo ◴[03 Sep 25 21:38 UTC] No.45120674[source]▶

>>45115619 #

I’m not sure if I’m just misunderstanding or we are talking about two different things. I know at a high level how transformers/LLMs decide its next token in the response it is generating.

My question to the post I replied to was basically: given a coding problem, and a list of possible solutions (candidates), how can a LLM generate a meaningful numerical score for each candidate to then say this one is a better solution than that one?

↑