The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 1 comments | 15 May 25 19:33 UTC | HN request time: 0.202s | source

Show context

_bin_ ◴[15 May 25 20:01 UTC] No.43998743[source]▶

I've found sonnet-3.7 to be incredibly inconsistent. It can do very well but has a strong tendency to get off-track and run off and do weird things.

3.5 is better for this, ime. I hooked claude desktop up to an MCP server to fake claude-code less the extortionate pricing and it works decently. I've been trying to apply it for rust work; it's not great yet (still doesn't really seem to "understand" rust's concepts) but can do some stuff if you make it `cargo check` after each change and stop it if it doesn't.

I expect something like o3-high is the best out there (aider leaderboards support this) either alone or in combination with 4.1, but tbh that's out of my price range. And frankly, I can't mentally get past paying a very high price for an LLM response that may or may not be useful; it leaves me incredibly resentful as a customer that your model can fail the task, requiring multiple "re-rolls", and you're passing that marginal cost to me.

replies(3): >>43998797 #>>43999022 #>>43999599 #

layoric ◴[15 May 25 20:31 UTC] No.43999022[source]▶

>>43998743 #

I've been using Mistral Medium 3 last couple of days, and I'm honestly surprised at how good it is. Highly recommend giving it a try if you haven't, especially if you are trying to reduce costs. I've basically switched from Claude to Mistral and honestly prefer it even if costs were equal.

replies(1): >>43999216 #

nico ◴[15 May 25 20:50 UTC] No.43999216[source]▶

>>43999022 #

How are you running the model? Mistral’s api or some local version through ollama, or something else?

replies(2): >>43999701 #>>44000490 #

kyleee ◴[15 May 25 21:52 UTC] No.43999701[source]▶

>>43999216 #

Is mistral on open router?

replies(1): >>44000020 #

1. nico ◴[15 May 25 22:29 UTC] No.44000020[source]▶

>>43999701 #

Yup https://openrouter.ai/provider/mistral

I guess it can't really be run locally https://www.reddit.com/r/LocalLLaMA/comments/1kgyfif/introdu...

↑