←back to thread

154 points rbanffy | 2 comments | | HN request time: 0.001s | source
Show context
1024bees ◴[] No.45075917[source]
It's nice to see a microarchitecture take a risk, and getting perspective on how this design performs with respect to performance, power and area would be interesting.

Very unlikely to me that this design would have comparable "raw" performance to a design that implements something closer to tomasulo's algorithm. The assumption that the latency of a load will be a l1 hit is a load bearing abstraction; I can imagine scenarios where this acts as a "double jeopardy" causing scheduling to lock up because the latency was mispredicted, but one could also speculate that isn't important because the workload is already memory bound.

There's an intuition in computer architecture that designs that lean on "static" instruction scheduling mechanisms are less performant than more dynamic mechanisms for general purpose compute, but we've had decades of compiler development since itanium "proved" this. Efficient computer (or whatever their name is) is doing something cool too, it's exciting to see where this will go

replies(4): >>45076340 #>>45078044 #>>45079822 #>>45081255 #
jasonwatkinspdx ◴[] No.45079822[source]
This is still using a Tomasulo like algorithm, it's just been shifted from the backend to the front end. And instructions don't lock up on an L1 miss. Instead the results of that instruction are marked as poisoned, and the front end replays the their microps forward in the execution stream once the L1 miss is resolved. As the article points out, this replay is likely to fill out otherwise unused execution slots on general purpose code, as OoO cpus rarely sustain their full execution width.

It's a smart idea, and has some parallels to the Mill CPU design. The backend is conceptually similar to a statically scheduled VLIW core, and the front end races ahead using it's matrix scorecard trying to queue up as much as it can for it vs the presence of unpredictable latencies.

replies(1): >>45080091 #
1. quantummagic ◴[] No.45080091[source]
> Mill CPU design

There were some fascinating concepts being explored in that project. It's a shame nothing came of it.

replies(1): >>45082621 #
2. Findecanor ◴[] No.45082621[source]
Last post on their forum a month ago, they claimed that they were live and having progress, but I dunno ...

What I'm afraid of is that perhaps they have been shifting what their goal is a little too often, which of course would delay their time to market.

For example, I think they have shifted from straightforward fixed-SIMD to scalable vectors of some sort, and last I heard they were talking about AI .. which usually means that there's some kind of support for matrix multiplication.