Condor's Cuzco RISC-V Core at Hot Chips 2025

Interesting idea. It's like putting a VLIW compilation pass into the scheduler, but without an intermediate microcode cache like NV Denver did. Without handling memory dependencies / cache hazards, I'm not so sure how well it will do in general-purpose use cases. They don't have the same code locality / second-layer icache problem that Denver had, but data loads are still going to be a mess.

I guess the notion is that data cache misses will basically lead to what could be called "instruction amplification," where an instruction will miss its scheduled time slot and have to be replayed, possibly repeatedly, until its dependencies are available. The article asserts that this is the rough equivalent of leaving execution ports unoccupied in a "traditional" OoO architecture, but I'm not so sure. I'm curious about how well this works in practice; I would worry that cache misses would rapidly multiply into a cascading failure case where the entire pipeline basically stalls and the architecture reverts to in-order level performance - just like most general-purpose VLIW architectures.