For example, an agent working on the dashboard for the Documents portion of my project has a completely different idea from the agent working on the dashboard for the Design portion of my project. The design consistency is not there, not just visually, but architecturally. Database schema and API ideas are inconsistent, for example. Even on the same input things are wildly different. It seems that if it can be different, it will be different.
You start to update instruction files to get things consistent, but then these end up being thousands of lines on a large project just to get the foundations right, eating into the context window.
I think ultimately we might need smaller language models trained on certain rules & schemas only, instead of on the universe of ideas that a prompt could result in. Small language models are likely the correct path.