Condor's Cuzco RISC-V Core at Hot Chips 2025

(chipsandcheese.com)

154 points rbanffy | 2 comments | 30 Aug 25 14:18 UTC | HN request time: 0.001s | source

Show context

1024bees ◴[30 Aug 25 16:31 UTC] No.45075917[source]▶

It's nice to see a microarchitecture take a risk, and getting perspective on how this design performs with respect to performance, power and area would be interesting.

Very unlikely to me that this design would have comparable "raw" performance to a design that implements something closer to tomasulo's algorithm. The assumption that the latency of a load will be a l1 hit is a load bearing abstraction; I can imagine scenarios where this acts as a "double jeopardy" causing scheduling to lock up because the latency was mispredicted, but one could also speculate that isn't important because the workload is already memory bound.

There's an intuition in computer architecture that designs that lean on "static" instruction scheduling mechanisms are less performant than more dynamic mechanisms for general purpose compute, but we've had decades of compiler development since itanium "proved" this. Efficient computer (or whatever their name is) is doing something cool too, it's exciting to see where this will go

replies(4): >>45076340 #>>45078044 #>>45079822 #>>45081255 #

jasonwatkinspdx ◴[31 Aug 25 02:31 UTC] No.45079822[source]▶

>>45075917 #

This is still using a Tomasulo like algorithm, it's just been shifted from the backend to the front end. And instructions don't lock up on an L1 miss. Instead the results of that instruction are marked as poisoned, and the front end replays the their microps forward in the execution stream once the L1 miss is resolved. As the article points out, this replay is likely to fill out otherwise unused execution slots on general purpose code, as OoO cpus rarely sustain their full execution width.

It's a smart idea, and has some parallels to the Mill CPU design. The backend is conceptually similar to a statically scheduled VLIW core, and the front end races ahead using it's matrix scorecard trying to queue up as much as it can for it vs the presence of unpredictable latencies.

replies(1): >>45080091 #

1. quantummagic ◴[31 Aug 25 03:22 UTC] No.45080091[source]▶

>>45079822 #

> Mill CPU design

There were some fascinating concepts being explored in that project. It's a shame nothing came of it.

replies(1): >>45082621 #

2. Findecanor ◴[31 Aug 25 12:20 UTC] No.45082621[source]▶

>>45080091 (TP) #

Last post on their forum a month ago, they claimed that they were live and having progress, but I dunno ...

What I'm afraid of is that perhaps they have been shifting what their goal is a little too often, which of course would delay their time to market.

For example, I think they have shifted from straightforward fixed-SIMD to scalable vectors of some sort, and last I heard they were talking about AI .. which usually means that there's some kind of support for matrix multiplication.

↑