Over 130 years ago, Dewey [1] criticized the model of psychology which looked at human behavior in terms of stimulus -> internal processing -> response. Stimuli don't just come to us; we seek them out and modify the world around us to cause them to occur. Dewey and other pragmatists proposed reframing stimulus/response in terms of "acts" or "habits," or changes to the unified agent+environment. Popper was getting at the same entanglement of agent and environment in "Three Worlds" and Simon in "The sciences of the artificial."
I see RL as an elaboration of the stimulus/response paradigm: the agent is discrete from the environment. Does RL work well in an environment like Minecraft, where the real game is modifying the relationship between actions and future states? What about in contexts like Twitter, where you're also modifying the value function (e.g. by cultivating audiences or by participating in a thread in a way which conditions the value function of future responses)?
[1] https://plato.stanford.edu/entries/dewey/#ReflArcDeweRecoPsy...