The Weird Concept of Branchless Programming

1. adrianmonk ◴[28 Sep 25 18:55 UTC] No.45406878[source]▶

I've always wondered if any CPUs have tried to reduce the branch penalty by speculatively executing both ways at once in parallel. You'd have two of everything (two pipelines, two ALUs, two sets of registers, etc.) and when you hit a conditional branch, instead of guessing which way to go, you'd essentially fork.

Obviously that requires a lot of extra transistors and you are doing computation that will be thrown away, so it's not free in terms of space or power/heat/energy. But perhaps it could handle cases that other approaches can't.

Even more of a wild idea is to pair up two cores and have them work together this way. When you have a core that would have been idle anyway, it can shadow an active core and be its doppelganger that takes the other branch. You'd need to have very fast communication between cores so the shadow core can spring into action instantly when you hit a branch.

My gut instinct is it's not worth it overall, but I'm curious whether it's been tried in the real world.

replies(8): >>45406919 #>>45406924 #>>45406951 #>>45407369 #>>45407535 #>>45409791 #>>45410325 #>>45418414 #

2. hawk_ ◴[28 Sep 25 19:01 UTC] No.45406919[source]▶

>>45406878 (TP) #

What has worked out very well in practice is hyper-threading. So you take instructions from two threads and if one of them is waiting on a branch the units of the CPU don't go unused.

3. terryf ◴[28 Sep 25 19:02 UTC] No.45406924[source]▶

>>45406878 (TP) #

Yes, this has been done for a while now, speculative execution + register renaming is how this happens. https://en.wikipedia.org/wiki/Register_renaming

replies(2): >>45408125 #>>45408170 #

4. thegreatwhale8 ◴[28 Sep 25 19:06 UTC] No.45406951[source]▶

>>45406878 (TP) #

It's what happens and it gave us a really big issue a few years ago https://en.wikipedia.org/wiki/Spectre_(security_vulnerabilit...

replies(1): >>45408166 #

5. mshockwave ◴[28 Sep 25 19:52 UTC] No.45407369[source]▶

>>45406878 (TP) #

yes, it has been done for at least a decade if not more

> Even more of a wild idea is to pair up two cores and have them work together this way

I don't think that'll be profitable, because...

> When you have a core that would have been idle anyway

...you'll just schedule in another process. Modern OS rarely runs short on available tasks to run

6. jasonwatkinspdx ◴[28 Sep 25 20:09 UTC] No.45407535[source]▶

>>45406878 (TP) #

Yes, it's been looked at. If you wanna skim the research use "Eager Execution" and "Disjoint Eager Execution" as jumping off points.

It doesn't require duplicating everything. You just need to add some additional bookkeeping of dependencies and what to retire vs kill at the end of the pipeline.

In practice branch predictors are so good that speculating off the "spine" of most likely path just isn't worth it.

In fact there were a lot of interesting microarchitectural ideas from the late 90s to early 00s that just ended up being moot because the combination of OoO speculation, branch predictors, and big caches proved so effective.

replies(1): >>45410526 #

7. o11c ◴[28 Sep 25 21:19 UTC] No.45408125[source]▶

>>45406924 #

Doesn't that only work on one side of the branch though?

8. umanwizard ◴[28 Sep 25 21:24 UTC] No.45408166[source]▶

>>45406951 #

No, that is because of speculatively executing one path, not both paths in parallel.

9. umanwizard ◴[28 Sep 25 21:25 UTC] No.45408170[source]▶

>>45406924 #

No, what’s been done for a while is speculatively executing one predicted path, not both paths in parallel.

10. anileated ◴[29 Sep 25 02:09 UTC] No.45409791[source]▶

>>45406878 (TP) #

> I've always wondered if any CPUs have tried to reduce the branch penalty by speculatively executing both ways at once in parallel

They already do it (edit: they don’t). It is difficult to get security right, however (see https://en.wikipedia.org/wiki/Spectre_(security_vulnerabilit...).

replies(1): >>45409808 #

11. umanwizard ◴[29 Sep 25 02:14 UTC] No.45409808[source]▶

>>45409791 #

That is not true, and several people have already make the same mistake in this thread. What is done now is speculatively executing one path, not two or more paths in parallel.

replies(1): >>45409847 #

12. anileated ◴[29 Sep 25 02:25 UTC] No.45409847{3}[source]▶

>>45409808 #

True, it was incorrect for me to say they already do parallel execution. However, when parallel execution is a special case of speculative execution, the security concern I meant to highlight still applies, doesn’t it?

13. recursivecaveat ◴[29 Sep 25 04:27 UTC] No.45410325[source]▶

>>45406878 (TP) #

They do this on FPGA a lot. Since you know statically the content of the branches, and you need to have resources there to run either of them, it is pretty low overhead to set them up to run in parallel and select the appropriate result afterwards.

14. adastra22 ◴[29 Sep 25 05:17 UTC] No.45410526[source]▶

>>45407535 #

I think you’re missing the context: that good branch prediction is what causes these security holes. “Wasteful” multi path execution is a security feature.

replies(1): >>45433489 #

15. Someone ◴[29 Sep 25 20:31 UTC] No.45418414[source]▶

>>45406878 (TP) #

> You'd have two of everything (two pipelines, two ALUs, two sets of registers, etc.)

As others said: yes, it has been tried and it works, but it costs a lot in hardware and power usage. A problem is that lots of code has a branch every 10 or so instructions. Fast high-end CPUs (the only realistic target for this feature) can dispatch multiple instructions per cycle. Combined that means you will hit a branch every two or three cycles. Because of that, you do not end up with two of everything but with way more.

So, you’re throwing away not 50% of your work but easily 80%.

Some code has fewer branches, but that often can easily be parallelized or vectorized.

16. jasonwatkinspdx ◴[01 Oct 25 01:48 UTC] No.45433489{3}[source]▶

>>45410526 #

No, security vulnerabilities are orthogonal. There's nothing about branch prediction that necessitates leaking information, as demonstrated by the fixes shipped in current processors.