The Weird Concept of Branchless Programming

I've always wondered if any CPUs have tried to reduce the branch penalty by speculatively executing both ways at once in parallel. You'd have two of everything (two pipelines, two ALUs, two sets of registers, etc.) and when you hit a conditional branch, instead of guessing which way to go, you'd essentially fork.

Obviously that requires a lot of extra transistors and you are doing computation that will be thrown away, so it's not free in terms of space or power/heat/energy. But perhaps it could handle cases that other approaches can't.

Even more of a wild idea is to pair up two cores and have them work together this way. When you have a core that would have been idle anyway, it can shadow an active core and be its doppelganger that takes the other branch. You'd need to have very fast communication between cores so the shadow core can spring into action instantly when you hit a branch.

My gut instinct is it's not worth it overall, but I'm curious whether it's been tried in the real world.