Optimizing the Ion compiler back end

1. rpearl ◴[16 Oct 24 21:16 UTC] No.41863932[source]▶

Wow, that dominator tree algorithm I wrote as an intern (https://bugzilla.mozilla.org/show_bug.cgi?id=666426) seems to have lasted 13 years! At the time we assumed that graphs would be small enough to not warrant the constant factor against Lengauer-Tarjan. ...And of course, browser-based apps have gotten much bigger since then.

Semi-NCA hadn't even been published yet and seems like the clear choice nowadays, so I'm glad to see it in place! Great work!

replies(4): >>41863991 #>>41865343 #>>41865494 #>>41866865 #

2. ◴[16 Oct 24 21:25 UTC] No.41863991[source]▶

>>41863932 (TP) #

3. mananaysiempre ◴[17 Oct 24 00:44 UTC] No.41865343[source]▶

>>41863932 (TP) #

> Semi-NCA hadn't even been published yet and seems like the clear choice nowadays[.]

For those who are awkwardly lingering and casting longing glasses at the entrance door of compiler engineering like I am, and who were just as dismayed by this sentence, it wasn’t “properly” published but looks to have been described in a thesis from 2005[1] and in an extended abstract (ugh) before that[2].

But also, the reduction of RMQ to NCA, really?.. Ouch. I’m having flashbacks to my (very brief) competitive programming days, and not the good kind.

[1] https://www.cs.princeton.edu/research/techreps/TR-737-05

[2] https://www.cse.uoi.gr/~loukas/index.files/dominators_soda04...

replies(1): >>41865498 #

4. DannyBee ◴[17 Oct 24 01:13 UTC] No.41865494[source]▶

>>41863932 (TP) #

Amusingly, the LLVM implementation was also written by an intern at one point :)

Semi-NCA was actually published back then, just not in the cited paper.

See "Finding dominators in practice", published in 2004, section 2.3. So two decades ago.

The main advantage of Semi-NCA (and why LLVM uses it) is that it makes for a good incremental update algorithm. See https://reviews.llvm.org/D34258

Truthfully, as much as I love Cooper and friends (they were responsible for a lot of very thoughtful, well engineered algorithms at a time when lots of stuff was just "here's some research i think is better"), the "simple" dataflow based algorithm was never worth it.

Part of the thing i was good at way back then was keeping track of all of the research in compiler opts and which was any good - most of their stuff is very good.

I used to go through every paper that came around and keep an up to date library of ones worth looking at (both now and in the future) that a bunch of folks used. This was harder back in the days of nobody really publishing code, i used to have to write a ton of prototype implementations to see which numbers were real and which were BS because they compared against crappy implementations or whatever.

SEMI-NCA was an obvious win - it was simple enough to implement and test, equally as fast as what existed now, and could easily be extended to incremental updates.

If you want to see what it takes to do incremental updates with LT, take a look at GCC's dominator update code back around that time period (I think it's unchanged since then, actually, but i haven't looked in a few years). There were a fairly small number of people who could understand the data structures and algorithms involved.

5. DannyBee ◴[17 Oct 24 01:14 UTC] No.41865498[source]▶

>>41865343 #

It was published prior to that in a paper named "Finding dominators in practice", published in ESA 2004[1].

For once, the title is not actually an oversell, it actually covers that topic quite well for a conference paper.

[1] https://link.springer.com/chapter/10.1007/978-3-540-30140-0_...

6. jandem ◴[17 Oct 24 06:08 UTC] No.41866865[source]▶

>>41863932 (TP) #

Nice to hear from you! Yes it was a very reasonable choice back then, before (enormous) asm.js and Wasm functions.

The compiler wasn't really designed for these huge graphs, but fortunately a lot of the issues can be fixed incrementally and it's still holding up well.