Can someone explain the benefit of having essentially 4 cores (2 ARM + 2 RISC-V) on the chip but only having 2 able to run simultaneously? Does this take significantly less die space than having all 4 available at all times?
cores are high bandwidth bus masters. Making a crossbar that supports 5 high bandwidth masters (4x core + dma) is likely harder, larger, and higher power than one that supports 3.
It's actually 10 masters (I+D for 4 cores + DMA read + DMA write) versus 6 masters. Or you could pre-arbitrate each pair of I and each pair of D ports. But even there the timing impact is unpalatable.