And telling me "just do both" is enforcing your world view and that is precisely what we're talking about _not_ doing.
Consider a situation where you are teaching a child. She tries her best and makes a mistake on her math homework. Saying that her attempt was terrible because an adult could do better may be the "fullest truth" in the most eye-rolling banal way possible, and discourages her from trying in the future which is ultimately unproductive.
This "fullest truth" argument fails to take into account desire and motivation, and thus is a bad model of the truth.
An LLM has no goals - it's just a machine optimized to minimize training errors, although I suppose you could view this as an innate hard-coded goal of minimizing next word error (relative to training set), in same way we might say a machine-like insect has some "goals".
Of course RLHF provides a longer time span (entire response vs next word) error to minimize, but I doubt training volume is enough for the model to internally model a goal of manipulating the listener as opposed to just favoring surface forms of response.
But simply by approximating human communication which often models goal oriented behavior, an LLM can have implicit goals. Which likely vary widely according to conversation context.
Implicit goals can be very effective. Nowhere in DNA is there any explicit goal to survive. However combinations of genes and markers selected for survivability create creatures with implicit goals to survive as tenacious as any explicit goals might be.
Perhaps at some point LLMs will start to evolve from the prompt->response model into something more asynchronous and with some activity happening in the background too.