←back to thread

700 points elipsitz | 1 comments | | HN request time: 0s | source
Show context
synergy20 ◴[] No.41192422[source]
You can pick either ARM cores or RISC-V cores on the same die? Never saw design like this before. Will this impact price and power consumption?

"The Hazard3 cores are optional: Users can at boot time select a pair of included Arm Cortex-M33 cores to run, or the pair of Hazard3 cores. Both options run at 150 MHz. The more bold could try running one RV and one Arm core together rather than two RV or two Arm.

Hazard3 is an open source design, and all the materials for it are here. It's a lightweight three-stage in-order RV32IMACZb* machine, which means it supports the base 32-bit RISC-V ISA with support for multiplication and division in hardware, atomic instructions, bit manipulation, and more."

replies(4): >>41193048 #>>41193435 #>>41193928 #>>41197487 #
jononor ◴[] No.41193928[source]
This seems like a great way to test the waters before a potential full-on transition to RISC-V. It allows to validate both technically and market reception, for a much lower cost than taping out a additional chip.
replies(4): >>41194211 #>>41195157 #>>41196084 #>>41197755 #
MBCook ◴[] No.41194211[source]
Fun for benchmarking too.

You’re limited to those two exact kinds of cores, but you know every other thing on the entire computer is 100% identical.

It’s not SBC 1 vs SBC 2, but they have different RAM chips and this one has a better cooler but that one better WiFi.

replies(1): >>41197359 #
phire ◴[] No.41197359[source]
I really hope people don't do this. Or at least not try to sell it as ARM vs RISC-V tests.

Because what you are really testing is the Cortex-M33 vs the Hazard 3, and they aren't equivalent.

They might both be 3 stage in-order RISC pipelines, but Cortex-M33 is technically superscalar, as it can dual-issue two 16bit instructions in certain situations. Also, the Cortex-M33 has a faster divider, 11 cycles with early termination vs 18 or 19 cycles on the Hazard 3.

replies(1): >>41197705 #
snvzz ◴[] No.41197705[source]
It'd help to know how much area each core takes within the die.

I would expect the ARM cores to be much larger, as well as use much more power.

replies(2): >>41198161 #>>41200260 #
phire ◴[] No.41198161[source]
Hard to tell.

If you ignore the FPU (I think it can be power gated off) the two cores should be roughly the same size and power consumption.

Dual issue sounds like it would add a bunch of complexity, but ARM describe it as "limited" (and that's about all I can say, I couldn't find any documentation). The impression I get is that it's really simple.

Something along the line of "if two 16 bit instructions are 32bit aligned, and they go down different pipelines, and they aren't dependant on each other" then execute both. It might be limitations that the second instruction can't access registers at all (for example, a branch instruction) or that it must only access registers from seperate register file bank, meaning you don't even have to add extra read/write ports to the register file.

If the feature is limited enough, you could get it down to just a few hundred gates in the instruction decode stage, taking advantage of resources in later stages that would have otherwise been idle.

According to ARM's specs, the Cortex-M33 takes the exact same area as the Cortex-M4 (the rough older equivalent without dual-issue, and arguably equal to the Hazard3), uses 2.5% less power and gets 17% more performance in the CoreMark benchmark.

replies(1): >>41198461 #
1. pclmulqdq ◴[] No.41198461[source]
That is exactly what the "limited dual issue" is - two non-conflicting pre-decoded instructions (either 16b+16b or if a stall has occurred) can be sent down the execution pipe at the same time. I believe that must be a memory op and an ALU op.