A single 1920x1080 framebuffer (which is a low resolution monitor in 2025 IMO) is 2MB. Add any compositing into the mix for multi window displays and it literally doesn’t fit in memory.
If you use a tile-based hardware renderer, such as on the original nintendo chip, then pixels are rendered on the fly to the screen by the hardware automatically pulling pixels based on the tile map.