I'd challenge if this is specific to coding? If you want to get a result that is largely like a repertoire of examples used in a training set, chat is probably workable? This is true for music. Visual art. Buildings. Anything, really?
But, if you want to start doing "domain specific" edits to the artifacts that are made, you are almost certainly going to want something like the app builders idea. Down thread, I mention how this is a lot like procedural generative techniques for game levels and such. Such that I think I am in agreement with your first bullet?
Similarly, if you want to make music with an instrument, it will be hard to ignore playing with said instrument more directly. I suspect some people can create things using chat as an interface. I just also suspect directly touching the artifacts at play is going to be more powerful.
I think I agree with the point on formal requirements. Not sure how that really applies to chat as an interface? I think it is hoping for a "laws of robotics" style that can have a test to confirm them? Reality could surprise me, but I always viewed that as largely a fiction item.