been testing edge cases - is the 1M context actually flat or does token position, structure or semantic grouping change how attention gets distributed?
when I feed in 20 files, sometimes mid-position content gets pulled harder than stuff at the end. feels like it’s not just order, but something deeper - ig the model’s building a memory map with internal weighting.
if there’s any semantic chunking or attention-aware preprocessing happening before inference, then layout starts mattering more than size. prompt design becomes spatial.
any internal tooling to trace which segments are influencing output?