In memory research there is this idea that there is a dual representation of every event...a more verbatim representation, and more context weighted representation. As we develop over early childhood, our verbatim memory representations increase in fidelity and strength against interference, but peaks around 6 to 10 years, depending on the specifics. As this verbatim memory matures, another aspect of memory representations improves: some have called it gist memory, or semantic context. Increases in memory performance continue into adolescence primarily due to increases in the ability to use context and gist (broad representations that capture the details by inference or an event) to increase accuracy overall, but also greater likelihood of committing false alarms to lures primed by semantically related material during learning...expressly because there becomes greater reliance on context to support recall accuracy.
So I could imagine such a system in a LLM where attention is directed to exact representations in one head, and another that keeps its attention on a coarser grain of information that anchors information. However, I am not that familiar with LLMs to know if that is just silly analogizing.