whereas my experience describing my problem and actually asking the AI is much, much smoother.
I'm not convinced the "LLM+scaffolding" paradigm will work all that well. sanity degrades with context length, and even the models with huge context windows don't seem to use it all that effectively. RAG searches often give lackluster results. the models fundamentally seem to do poorly with using commands to accomplish tasks.
I think fundamental model advances are needed to make most things more than superficially automatable: better planning/goal-directed behavior, a more organic connection to RAG context, automatic gym synthesis, and RL-based fine tuning (that holds up to distribution shift.)
I think that will come, but I think if LLMs plateau here they won't have much more impact than Google Search did in the '90s.