Most active commenters

troupo(8)
3uler(7)
touristtam(5)
raincole(4)
sublinear(4)

Popular/hot comments

>>45156574 #
>>45156172 #
>>45156624 #
>>45158467 #
>>45159935 #

The Claude Code Framework Wars

(shmck.substack.com)

1. matt3D ◴[07 Sep 25 07:34 UTC] No.45156172[source]▶

>>45155302 (OP) #

Pretty surprised BMAD-method wasn't mentioned.

For my money it's by far the best Claude Code compliment.

replies(4): >>45156236 #>>45156624 #>>45156645 #>>45165605 #

2. MarcelOlsz ◴[07 Sep 25 07:45 UTC] No.45156236[source]▶

>>45156172 #

Same with taskmaster, also not there.

replies(1): >>45156662 #

3. troupo ◴[07 Sep 25 09:00 UTC] No.45156574[source]▶

>>45155302 (OP) #

> a set of rules, roles, and workflows that make its output predictable and valuable.

Let me stop you right there. Are you seriously talking about predictable when talking about a non-deterministic black box over which you have no control?

replies(6): >>45156592 #>>45156630 #>>45156764 #>>45156773 #>>45158223 #>>45159030 #

4. andsoitis ◴[07 Sep 25 09:05 UTC] No.45156592[source]▶

>>45156574 #

> Are you seriously talking about predictable when talking about a non-deterministic

Predictability and determinism are related but different concepts.

A system can be predictable in a probabilistic sense, rather than an exact, deterministic one. This means that while you may not be able to predict the precise outcome of a single event, you can accurately forecast the overall behavior of the system and the likelihood of different outcomes.

https://philosophy.stackexchange.com/questions/96145/determi...

Similarly, a system can be deterministic yet unpredictable due to practical limitations like sensitivity to initial conditions (chaos theory), lack of information, or the inability to compute predictions in time.

replies(2): >>45156775 #>>45156812 #

5. CGamesPlay ◴[07 Sep 25 09:12 UTC] No.45156624[source]▶

>>45156172 #

What is this? Just a system prompt? What makes it so good for you?

https://github.com/bmad-code-org/BMAD-METHOD

replies(3): >>45156657 #>>45157235 #>>45157970 #

6. 3uler ◴[07 Sep 25 09:16 UTC] No.45156645[source]▶

>>45156172 #

The BMAD system seems similar to the AgentOS mentioned in the post.

This way of context engineering has definitely been the way to go for me, although I’ve just implemented it myself… using Claude to help generate commands and agents and tweaking them to my liking, lately been using json as well as markdown to share context between steps.

7. 3uler ◴[07 Sep 25 09:18 UTC] No.45156657{3}[source]▶

>>45156624 #

It’s basically a set of commands and agents and a way to structure context.

replies(1): >>45165619 #

8. 3uler ◴[07 Sep 25 09:19 UTC] No.45156662{3}[source]▶

>>45156236 #

I never found taskmaster that useful, something about how it forced you to work didn’t click with me…

replies(1): >>45161145 #

9. raincole ◴[07 Sep 25 09:40 UTC] No.45156764[source]▶

>>45156574 #

Yes, and there is absolutely nothing wrong with that. Living creatures are mostly black boxes. It doesn't mean we don't aim for making medicine with predictable effects (and side effects).

replies(2): >>45156832 #>>45156841 #

10. dist-epoch ◴[07 Sep 25 09:42 UTC] No.45156773[source]▶

>>45156574 #

Non-deterministic does not mean not-predictable.

Quantum mechanics is non-deterministic, yet you can predict the motion of objects with exquisite precision.

All these "non-deterministic boxes" will give the same answer to the question "What is the capital of France"

replies(1): >>45156811 #

11. sublinear ◴[07 Sep 25 09:42 UTC] No.45156775{3}[source]▶

>>45156592 #

The topic of chaos is underrated when people talk about deterministic systems, but I think it's at least (usually?/always?) a tractable problem to draw up a fractal or something and find the non-chaotic regions of a solution space. You have nice variables to work with when you draw up a model of the problem.

Maybe someone can elaborate better, but it seems there is no such luck trying to map probability onto problems the way "AI" is being used today. It's not just a matter of feeding it more data, but finding what data you haven't fed it or in some cases knowing you can't feed it some data because we have no known way to represent what is obvious to humans.

12. AbuAssar ◴[07 Sep 25 09:50 UTC] No.45156807[source]▶

>>45155302 (OP) #

did anyone try any of these so called frameworks? do they deliver or just riding the hype-wagon?

replies(1): >>45157441 #

13. sublinear ◴[07 Sep 25 09:51 UTC] No.45156811{3}[source]▶

>>45156773 #

Yes, but the "exquisite precision" comes from the deterministic parts of physics.

replies(1): >>45157258 #

14. troupo ◴[07 Sep 25 09:51 UTC] No.45156812{3}[source]▶

>>45156592 #

From the discussion in the link: "Predictability means that you can figure out what will happen next based on what happened previously."

Having used nearly all of the methods in the original article, I can predict that the output of the model is nearly indistinguishable from a coin toss for many, many, many rather obvious reasons.

15. troupo ◴[07 Sep 25 09:53 UTC] No.45156832{3}[source]▶

>>45156764 #

Medicine that can either kill you, cure you, or have no effect at any given time for the same disease is quite unlikely to even pass certification.

Do you know why?

replies(1): >>45156946 #

16. sublinear ◴[07 Sep 25 09:54 UTC] No.45156841{3}[source]▶

>>45156764 #

I don't think that's accurate.

Don't most people working in medicine usually have biology and chemistry degrees? Are you saying those sciences are dark arts?

replies(1): >>45156895 #

17. raincole ◴[07 Sep 25 10:07 UTC] No.45156895{4}[source]▶

>>45156841 #

Having biology degrees doesn't make you understand every detail of human body. There are many, many drugs that are known to work (by double blind testing) but we don't know exactly how.

The details of how penicillin kills bacteria were discovered in 2000s. Only about half a century of after its commercial production. And I'm quite sure we'll still see some more missing puzzle pieces in the future.

replies(1): >>45156929 #

18. CompoundEyes ◴[07 Sep 25 10:09 UTC] No.45156903[source]▶

>>45155302 (OP) #

For anyone that’s applied one of these at what level of autonomy are you using it? And in what setting? Greenfield?

I see one mention brownfield development. Has anyone with experience using these frameworks fired up Claude Code on enterprise software and had confident results? I have unchecked access to Claude Code at work and based on personal agentic coding I’m sure they do aid it. I have decent but not consistent results with my own “system” in our code base. At least until the front end UI components are involved even with Playwright. But I’m curious — how much litter is left behind? How is your coworker tolerance? How large are your pull requests? What is your inference cost? How do these manage parallel?

The README documentation for many have a mix of fevered infomercial, system specific jargon, emoji splatter and someone’s dad’s very specific toolbox organization approach only he understands. Some feel like they’re setting the stage to sell something…trademarked!? Won’t Anthropic and others just incorporate the best of the bunch into their CLI tools in time?

Outside of work I’ve regularly used a reasoning model to produce a ten page spec, wired my project with strictest lint, type check, formatter, hooks, instruct it to check off as it goes and do red green TDD. I can tell gpt-5 in Cursor to “go”, occasionally nudge to stay on task and “ok next” then I’ll end up with what I wanted in time plus gold plating. The last one was a CLI tool for my agent to invoke and track their own work. Anyone with the same tools can just roll their own.

replies(1): >>45157426 #

19. sublinear ◴[07 Sep 25 10:14 UTC] No.45156929{5}[source]▶

>>45156895 #

Yes, but I think we want to know how they work? Not knowing "exactly how" but having a good ballpark idea is not equivalent to letting AI throw stuff at the wall to see what sticks.

20. raincole ◴[07 Sep 25 10:19 UTC] No.45156946{4}[source]▶

>>45156832 #

That is exactly my point.

replies(1): >>45158844 #

21. musbemus ◴[07 Sep 25 11:16 UTC] No.45157208[source]▶

>>45155302 (OP) #

One thing I hope to see included is a precursor step when constructing specs where Claude is used to intelligently inquire about gaps to fill that would disambiguate the implementation. If you told an engineer to do something with a set of requirements and outcomes, they'd naturally also have follow-up questions to ensure alignment before executing.

replies(1): >>45157703 #

22. matt3D ◴[07 Sep 25 11:22 UTC] No.45157235{3}[source]▶

>>45156624 #

It manifests as a sort of extension for Claude Code.

When I'm in the terminal I can call on Agents who can create standardised documents so there is a memory of the product management side of things that extends beyond the context window of Claude.

It guides you through the specification process so that you have extremely tight tasks for Claude to churn through, with any context, documentation and acceptance criteria.

Perhaps there are others similar, but I have found it completely transformative.

23. baq ◴[07 Sep 25 11:28 UTC] No.45157258{4}[source]▶

>>45156811 #

Nah. The only thing we can establish precisely at the lowest levels is probability. We can and do engineer systems to maximize probabilities of desired outcomes and minimize probabilities of undesirable ones.

Frankly I don’t understand how software engineers (not coders mind you) can have issues with non deterministic tools while browsing the web on a network which can stop working anytime for any reason.

replies(1): >>45158144 #

24. dsiegel2275 ◴[07 Sep 25 12:01 UTC] No.45157426[source]▶

>>45156903 #

I'm only three weeks into using Claude Code but I'm now seeing impressive results using a structured, "role" or "persona" based approach in a large (500K+ SLOC) Elixir / Phoenix codebase. I'm using the $200 Max plan - so my inference costs are fixed.

For certain, the results are better when I use it to build new features into our platform - as opposed to making complicated refactors or other deep changes to existing parts of the system. But even in the latter case, if we have good technical documentation capturing the design and how parts of the system work (which we don't in many places), Claude Code can make good progress.

At first I was seeing a fair amount of what I would consider "bad code" - implementation and code that either didn't follow accepted coding style and patterns or that simply wasn't structured for reusability, maintainability. But after strengthening the CLAUDE.md file and adding an "elixir-code-reviewer" subagent which the "developer" persona had to use - the quality of code improved significantly.

Our platform is open source, you can see our current Claude commands and subagents here: https://github.com/Simon-Initiative/oli-torus/tree/master/.c...

replies(2): >>45157628 #>>45157687 #

25. stuartjohnson12 ◴[07 Sep 25 12:04 UTC] No.45157441[source]▶

>>45156807 #

Having occasionally looked into these, the overwhelming issue is that pretty much all of them are built with themselves and the output is what you'd expect - vast quantities of untested functionality, no documentation beyond a wall of Claude-isms about how next generation and feature rich it is, and no thought to applicability beyond the narrow set of projects that interest the author.

26. stocksinsmocks ◴[07 Sep 25 12:33 UTC] No.45157615[source]▶

>>45155302 (OP) #

I got a strong sense of LLM style in the blog. Interesting information, but that I’m learning about AI from AI is amusing.

replies(2): >>45157671 #>>45157918 #

27. iandanforth ◴[07 Sep 25 12:35 UTC] No.45157628{3}[source]▶

>>45157426 #

Thanks for sharing that extensive documentation!

28. stingraycharles ◴[07 Sep 25 12:41 UTC] No.45157671[source]▶

>>45157615 #

Yeah, unfortunately this is very often the case with articles about AI.

In my own experience, this type of stuff is just wishful thinking right now: for anything non-trivial, you still need to monitor Claude Code closely and interrupt when you discover it goes on the wrong train of thought.

Additionally, for security reasons, you don’t want it to give it too many permissions, and/or actually see which commands it’s executing.

The “frameworks” OP talks about are still far away. Right now the best way to think about it is an intern which is usually wrong but can cramp out code at lightning speed.

29. fny ◴[07 Sep 25 12:44 UTC] No.45157687{3}[source]▶

>>45157426 #

I didn't realize just how bad LLMs are with unpopular languages until now:

"Elixir lists do not support index based access via the access syntax"

"Never use else if or elseif in Elixir, always use cond or case for multiple conditionals."

replies(1): >>45158076 #

30. joshstrange ◴[07 Sep 25 12:46 UTC] No.45157703[source]▶

>>45157208 #

Yes, kind of like open AI’s deep research tool. I often find that a number of mistakes are made because no clarification questions or ask or even considered.

31. d_503 ◴[07 Sep 25 12:46 UTC] No.45157704[source]▶

>>45155302 (OP) #

Clown wars

32. Szpadel ◴[07 Sep 25 13:18 UTC] No.45157918[source]▶

>>45157615 #

looks like either author didn't checked well repos that link or it's indeed redacted deep research output.

Examples: superClaude is not mcp server at all metaGPT looks like is not compatable with Claude code at all

33. imiric ◴[07 Sep 25 13:27 UTC] No.45157970{3}[source]▶

>>45156624 #

This amuses me to no end: https://github.com/bmad-code-org/BMAD-METHOD/issues/546

An AI tool finding issues in a set of YAML and Markdown files generated by an AI tool, and humans puzzled by all of it.

> We should really have some code reviewer...

Gemini to the rescue!

34. LiamPowell ◴[07 Sep 25 13:42 UTC] No.45158076{4}[source]▶

>>45157687 #

Interestingly I've found that Claude is very good at writing valid Ada, it just writes complete garbage that doesn't follow the specification at all. As an example I asked it to change the handling of command line arguments in a program so that duplicates would cause an error rather than being ignored. If I recall correctly it took 6 course corrections to get to what I asked for, but each time it wrote valid code that just didn't complete the task. One I remember was arbitrarily limiting the length of an argument to 4096 characters and then only accepting arguments that were exactly 4096 characters.

Here is the relevant change, it didn't have any sort of hidden complexity: https://github.com/Prunt3D/prunt/commit/b4d7f5e35be6017846b8...

replies(1): >>45158467 #

35. lbreakjai ◴[07 Sep 25 13:51 UTC] No.45158144{5}[source]▶

>>45157258 #

Because the failure modes are deterministic. An API can be down, which you can easily plan for, but if it's up and returning a 200, you can reasonably expect it to return what it's supposed to.

36. brookst ◴[07 Sep 25 14:00 UTC] No.45158223[source]▶

>>45156574 #

Would you say a top tier human developer produces predictable output? I would, in the sense that it will be well designed and implemented code that meets the requirements. Can we guess every variable name and logic choice? Probably not.

replies(1): >>45158927 #

37. fny ◴[07 Sep 25 14:32 UTC] No.45158467{5}[source]▶

>>45158076 #

I'm pretty convinced the only developers who think we're on the cusp of AGI code exclusively in Python or JavaScript.

replies(3): >>45158656 #>>45165177 #>>45166210 #

38. Leynos ◴[07 Sep 25 14:49 UTC] No.45158656{6}[source]▶

>>45158467 #

GPT-5 is a dab hand at Rust

replies(1): >>45162615 #

39. troupo ◴[07 Sep 25 15:07 UTC] No.45158844{5}[source]▶

>>45156946 #

Then no one understands your point

replies(1): >>45159112 #

40. troupo ◴[07 Sep 25 15:15 UTC] No.45158927{3}[source]▶

>>45158223 #

> Would you say a top tier human developer produces predictable output?

First you'd have to prove that LLMs can be equated to a "top tier human developer"

> I would, in the sense that it will be well designed and implemented code that meets the requirements.

Indeed. Something LLMs can or cannot do with all the predictability of a toss coin.

replies(1): >>45181167 #

41. signatoremo ◴[07 Sep 25 15:26 UTC] No.45159030[source]▶

>>45156574 #

Huh? I guarantee you, if you give two different developers tbe exact sane set of requirements, that you’d get two very different programs. Try it. They likely perform differently also, performance- or resource-wise.

Would you still call that predictable? Of course you would, as long as they meet your requirements. Put it another way, anything is unpredictable depending on your level of scrutiny. AI is likely less predictable than human, doesn’t mean it isn’t helpful. You are free to dismiss it of course.

replies(1): >>45165087 #

42. raincole ◴[07 Sep 25 15:36 UTC] No.45159112{6}[source]▶

>>45158844 #

Maybe you can try to read other comments below your original comment, as they mostly share the same point and I don't bother to repeat what everyone else has said.

I'll put it concisely:

Trying to build predictable result upon unpredictable, not fully understood mechanisms is an extremely common practice in every single field.

But anyway you think LLM is just coin toss so I won't engage with this sub-thread anymore.

replies(1): >>45159513 #

43. troupo ◴[07 Sep 25 16:16 UTC] No.45159513{7}[source]▶

>>45159112 #

And you should read replies to those replies, including yours.

Nothing in the current AI world is as predictable as, say, the medicine you can buy or you get prescribed. None of the shamanic "just one more prompt bro" rituals have the predicting power of physics laws. Etc.

You could reflect on that.

> But anyway you think LLM is just coin toss

A person telling me to "try to read comments" couldn't read and understand my comment.

replies(1): >>45165771 #

44. grim_io ◴[07 Sep 25 17:02 UTC] No.45159935[source]▶

>>45155302 (OP) #

I've tried some of those "frameworks" for claude code, but it's difficult to measure any objective improvement.

I tend to lean towards them being snake oil. A lot of process and ritual around using them, but for what?

I don't think the models themselves are a good fit for the way these frameworks are being used. It probably goes against their training.

Now we try to poison the context with lots of (for my actual task at hand) useless information so that the model can conform to my superficial song-and-dance process? This seems backwards.

I would argue that we need less context poisoning with useless information. Give the model the most precise information for the actual work to be done and iterate upon that. The song and dance process should happen outside of the context constrained agent.

replies(3): >>45160140 #>>45160921 #>>45163521 #

45. montroser ◴[07 Sep 25 17:19 UTC] No.45160140[source]▶

>>45159935 #

It's kinda the same as real life project managers.

46. g42gregory ◴[07 Sep 25 17:56 UTC] No.45160572[source]▶

>>45155302 (OP) #

I tried B-MAD Framework and it was like night and day. Can’t work without it. I’d like to see more frameworks like that.

47. bicx ◴[07 Sep 25 18:33 UTC] No.45160921[source]▶

>>45159935 #

I adopted a couple practices (using dev containers and worktrees) just to make life a little easier. I also built my own shell script “framework” to help manage the worktrees and create project files. However, that took me just a couple days to do on my own (also using CC), and it didn’t lock me into a specific tool.

I do agree that context poisoning is a real thing to watch out for. Coincidentally, I’d noticed MCP endpoint definitions had started taking a substantial block of context for me (~20k tokens), and that’s now something I consider when adopting any MCP.

replies(1): >>45161259 #

48. MarcelOlsz ◴[07 Sep 25 19:00 UTC] No.45161145{4}[source]▶

>>45156662 #

Yeah that's fair, it doesn't feel great. It does work if you have something very concrete you want to make and know how to do it, so its pretty easy to scope out into some tasks and subtasks, but working on something where you generate it as you go and requires editing tasks its pretty bad.

49. grim_io ◴[07 Sep 25 19:13 UTC] No.45161259{3}[source]▶

>>45160921 #

I'm considering removing serena MCP, since cc got better with its own tools.

The new /context cc command is great for visualizing what uses how much of the context.

On the other hand, I'm curious about dagger's container-use MCP. https://container-use.com/agent-integrations

50. anonyfox ◴[07 Sep 25 19:21 UTC] No.45161324[source]▶

>>45155302 (OP) #

isn't this textbook "bitter lesson" playing out here again? whatever "frameworks" people try to build, the next generation of models will make them obsolete, no?

---

link: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

replies(1): >>45165814 #

51. sharts ◴[07 Sep 25 21:29 UTC] No.45162347[source]▶

>>45155302 (OP) #

context management seems like low level programming where you need to carefully place the right things on cpu registers to do the operation you want correctly.

On difference is that we have less control of the context to add/remove things per task necessary.

52. evolve2k ◴[07 Sep 25 22:05 UTC] No.45162615{7}[source]▶

>>45158656 #

I was hearing that it was good with writing Ruby on Rails as the rails community is so structured already in where things go and sorta the Rails way of writing the code. Anyone have experience with this?

53. faangguyindia ◴[07 Sep 25 22:45 UTC] No.45162881[source]▶

>>45155302 (OP) #

I often wonder why agent is not allowed to manage its own context like humans do?

Why recycle full history into every future turn until you run out of context window?

Perhaps letting agent manage its own context while knowing what an effective context and the harm or going over context or smartly making that tradeoff, it can navigate the tasks better?

54. nicwolff ◴[08 Sep 25 00:17 UTC] No.45163521[source]▶

>>45159935 #

This article doesn't mention "subagents" https://docs.anthropic.com/en/docs/claude-code/sub-agents which makes me wonder when it was written. I'm finding that just delegating "scan the memory bank for information relevant to the current task" and "run the unit and functional tests and report back only the failures or coverage" to subagents does a lot to keep the main agent's context from filling up.

replies(2): >>45163550 #>>45164964 #

55. bdangubic ◴[08 Sep 25 00:22 UTC] No.45163550{3}[source]▶

>>45163521 #

> which makes me wonder when it was written

On Sept 6, 2025

56. vivzkestrel ◴[08 Sep 25 03:04 UTC] No.45164302[source]▶

>>45155302 (OP) #

what is your opinion about the research paper that got published recently which claims that AI coding actually slows you down 20%

replies(1): >>45165124 #

57. 3uler ◴[08 Sep 25 05:45 UTC] No.45164964{3}[source]▶

>>45163521 #

Many of the frameworks mentioned leverage subagents

58. troupo ◴[08 Sep 25 06:08 UTC] No.45165087{3}[source]▶

>>45159030 #

> Of course you would, as long as they meet your requirements.

Key word: "as long as they meet your requirements".

I've yet to meet an LLM that can predictably do that. Even on the same code with the same tools/prompt/rituals a few hours apart.

> AI is likely less predictable than human, doesn’t mean it isn’t helpful.

I'm struggling to see where I said they weren't helpful or that I dismissed them

59. 3uler ◴[08 Sep 25 06:14 UTC] No.45165124[source]▶

>>45164302 #

That’s like saying a vim expert would be slower in VS Code - technically the IDE does more, but expertise with your existing tools often beats learning new ones.

Also that study was from early 2025 before Claude 4 which to me was a big break through in productivity, I did not really find these tools too useful before using sonnet 4.

replies(1): >>45165574 #

60. 3uler ◴[08 Sep 25 06:24 UTC] No.45165177{6}[source]▶

>>45158467 #

The problem is that there is a lot of bad python and typescript/javascript out there, and I similarly find my self having to define my coding style in context files in order to get decent results in newer code bases without a lot of examples to follow. And even then you need to say do it like @example.py all the time.

Maybe the future is fine-tuned models on specific coding styles?

61. vivzkestrel ◴[08 Sep 25 07:40 UTC] No.45165574{3}[source]▶

>>45165124 #

but has that cut down your prompting time, i assume an AI agent would take a fixed amount of time to generate N lines of code. Constructing effective prompts is probably where most time is spent, has this time been cut down with newer releases or has it been proved somehow that we need N less prompts to achieve the same result with newer AI models?

replies(1): >>45168241 #

62. touristtam ◴[08 Sep 25 07:45 UTC] No.45165605[source]▶

>>45156172 #

BMAD is mentioned in the QA part, FWIW.

63. touristtam ◴[08 Sep 25 07:46 UTC] No.45165619{4}[source]▶

>>45156657 #

It is Agile for interactive session with an LLM.

64. touristtam ◴[08 Sep 25 08:08 UTC] No.45165771{8}[source]▶

>>45159513 #

> Nothing in the current AI world is as predictable as, say, the medicine you can buy or you get prescribed.

Do you know there are approve drugs that have been put in the market for treating one ailment and that have proven to have effect on another or have been shown to have unwanted side effect, and therefore have been shifted? The whole drugs _market_ is full of them and all that is needed is to have enough trial to prove desired effect...

The LLM output is yours to decide if it is relevant to your work or not, but it seems that your experience is consistently subpar with what others have reported.

replies(1): >>45173725 #

65. touristtam ◴[08 Sep 25 08:15 UTC] No.45165814[source]▶

>>45161324 #

The models are still ingesting text, are they not? Those framework are providing textual guidance to what the task at hand should be aim for. Those are formalising part of the context passed to the LLM, regardless of the model itself.

66. touristtam ◴[08 Sep 25 09:22 UTC] No.45166210{6}[source]▶

>>45158467 #

I think those are cult followers which _leaders_ have no understanding of programming language in the first place. The Python/JS bias might be because of the training dataset ingested.

67. 3uler ◴[08 Sep 25 13:49 UTC] No.45168241{4}[source]▶

>>45165574 #

It’s less about the models getting smarter and more about them getting better at handling vague requests and context acquisition. They’re better at figuring out what they need to know, I’m better at shaping that process, and I have structured workflows for managing and efficiently feeding the right context into each prompt.

68. troupo ◴[08 Sep 25 20:43 UTC] No.45173725{9}[source]▶

>>45165771 #

> Do you know there are

Yes, I know. Doesn't really disprove my point

> all that is needed is to have enough trial to prove desired effect

all that is needed lol. You mean multi-stage trials with baselines, control groups, testing against placebos etc.?

Compared to "yolo just believe me" of LLMs.

> The LLM output is yours to decide if it is relevant to your work or not, but it seems that your experience is consistently subpar with what others have reported.

Indeed, because all we have to do with those reports is have blind unquestionable faith. "Just one more prompt, and I swear it will be 100% more efficient with literally othing to judge efficiency by, no baselines, nothing".

69. brookst ◴[09 Sep 25 12:42 UTC] No.45181167{4}[source]▶

>>45158927 #

> First you'd have to prove that LLMs can be equated to a "top tier human developer"

Huh? Can you elaborate? I thought the claim was that predictable output is the gold standard and variance in LLM output means they can never rival humans.

Please restate if I missed why deterministic output is so important.

↑