←back to thread

170 points judicious | 1 comments | | HN request time: 0.27s | source
Show context
adrianmonk ◴[] No.45406878[source]
I've always wondered if any CPUs have tried to reduce the branch penalty by speculatively executing both ways at once in parallel. You'd have two of everything (two pipelines, two ALUs, two sets of registers, etc.) and when you hit a conditional branch, instead of guessing which way to go, you'd essentially fork.

Obviously that requires a lot of extra transistors and you are doing computation that will be thrown away, so it's not free in terms of space or power/heat/energy. But perhaps it could handle cases that other approaches can't.

Even more of a wild idea is to pair up two cores and have them work together this way. When you have a core that would have been idle anyway, it can shadow an active core and be its doppelganger that takes the other branch. You'd need to have very fast communication between cores so the shadow core can spring into action instantly when you hit a branch.

My gut instinct is it's not worth it overall, but I'm curious whether it's been tried in the real world.

replies(8): >>45406919 #>>45406924 #>>45406951 #>>45407369 #>>45407535 #>>45409791 #>>45410325 #>>45418414 #
1. Someone ◴[] No.45418414[source]
> You'd have two of everything (two pipelines, two ALUs, two sets of registers, etc.)

As others said: yes, it has been tried and it works, but it costs a lot in hardware and power usage. A problem is that lots of code has a branch every 10 or so instructions. Fast high-end CPUs (the only realistic target for this feature) can dispatch multiple instructions per cycle. Combined that means you will hit a branch every two or three cycles. Because of that, you do not end up with two of everything but with way more.

So, you’re throwing away not 50% of your work but easily 80%.

Some code has fewer branches, but that often can easily be parallelized or vectorized.