Is it really the case that browsers have default-enabled all sorts of extensions that are not yet widely supported by the rest of the ecosystem?
I just ran it on my mac M2 Max and got:
(F)irefox 131.0.3
(E)dge 129.0, V8 12.9.17.8
(S)afari 18.0 (20619.1.26.31.6)
Speedometer 3.0
(F) 25.9
(E) 22.3
(S) 30.9
JetStream2
(F) 251.787
(E) 350.74
(S) 360.568
Safari seems slightly faster in all benchmarks. I did not run motionmark because it takes forever :-/. The page says JetStream2 is what you want if you want to benchmark wasm.How this relates to TFA, no idea ... is not really easy to tell which version of SpiderMonkey is running on the installed Firefox.
--
I don't know the answer, but it would be hard to blame them for following normal browser development practices on the standard they created for the purpose of being in browsers.
This doesn't really make sense, given that the wasm used by websites is going to import a bunch of JS functions as dependencies. You're not going to have those available in any native environment.
> Is it really the case that browsers have default-enabled all sorts of extensions that are not yet widely supported by the rest of the ecosystem?
Yes
Photoshop in particular is a good example of a bleeding edge wasm app - browsers had to relax restrictions on things like function pointer table size in order for it to work. So I wouldn't expect it to build and run anywhere outside of v8 or spidermonkey.
However, the OP has one scenario where the opposite was true - they were using a dense bitset that needed to be obscenely huge because of degenerate code in the wasm module, so swapping to a sparse container was a win.
In the end you just have to profile and understand your data.
*High five*
I fixed several of these during my time as a compiler engineer too.
It's true what they say, quadratic is the worst possible time complexity. Fast enough to work on all your test cases, slow enough to explode in prod.
Semi-NCA hadn't even been published yet and seems like the clear choice nowadays, so I'm glad to see it in place! Great work!
Browsers do a on-demand 'first tier' compilation for fast startup, and in the background using threads they do a high quality AOT compilation of the whole module. That high quality AOT compilation is cached and used for future page loads.
It is possible to use a JIT model for this but AFAIK it is typically not done. In some configurations (i.e. lockdown mode) WASM is interpreted instead of AOT compiled.
Isn't this what the last line of the article is hinting at? > our WebAssembly team is making great progress rearchitecting the Wasm compiler pipeline. This work will make it possible to Ion-compile individual Wasm functions as they warm up instead of compiling everything immediately.
A sorted array of bit locations would represent a sparse bit set well enough to start, with O(N) storage and O(log N) access. Once the sets became large and/or dense, another data structure could be considered.
For those who are awkwardly lingering and casting longing glasses at the entrance door of compiler engineering like I am, and who were just as dismayed by this sentence, it wasn’t “properly” published but looks to have been described in a thesis from 2005[1] and in an extended abstract (ugh) before that[2].
But also, the reduction of RMQ to NCA, really?.. Ouch. I’m having flashbacks to my (very brief) competitive programming days, and not the good kind.
[1] https://www.cs.princeton.edu/research/techreps/TR-737-05
[2] https://www.cse.uoi.gr/~loukas/index.files/dominators_soda04...
Semi-NCA was actually published back then, just not in the cited paper.
See "Finding dominators in practice", published in 2004, section 2.3. So two decades ago.
The main advantage of Semi-NCA (and why LLVM uses it) is that it makes for a good incremental update algorithm. See https://reviews.llvm.org/D34258
Truthfully, as much as I love Cooper and friends (they were responsible for a lot of very thoughtful, well engineered algorithms at a time when lots of stuff was just "here's some research i think is better"), the "simple" dataflow based algorithm was never worth it.
Part of the thing i was good at way back then was keeping track of all of the research in compiler opts and which was any good - most of their stuff is very good.
I used to go through every paper that came around and keep an up to date library of ones worth looking at (both now and in the future) that a bunch of folks used. This was harder back in the days of nobody really publishing code, i used to have to write a ton of prototype implementations to see which numbers were real and which were BS because they compared against crappy implementations or whatever.
SEMI-NCA was an obvious win - it was simple enough to implement and test, equally as fast as what existed now, and could easily be extended to incremental updates.
If you want to see what it takes to do incremental updates with LT, take a look at GCC's dominator update code back around that time period (I think it's unchanged since then, actually, but i haven't looked in a few years). There were a fairly small number of people who could understand the data structures and algorithms involved.
For once, the title is not actually an oversell, it actually covers that topic quite well for a conference paper.
[1] https://link.springer.com/chapter/10.1007/978-3-540-30140-0_...
The compiler wasn't really designed for these huge graphs, but fortunately a lot of the issues can be fixed incrementally and it's still holding up well.
That is a stunningly large function. I've looked around the onnxruntime sources and can't find anything like it. The largest C file is under 6000 lines. Does anyone know what function it's referring to?
Now we get a couple of startups trying to make WebAssembly outside of the browser as if was a novel idea, never done before.
Also the CPU emulator tick function which is a single huge switch statement with several thousand case branches was actually running into a slow path in Clang a couple of years ago (it took like 30 seconds to build that one source file). This was fixed at some point though.
I also seem to remember that Emscripten had a build setting to break up large WASM functions into smaller snippets to work around such quadratic outliers in browser WASM engines.
I would love to see how Firefox compares now with the other big competitors.
Very niche but effective. When people ask why do FAANGs focus on data structure algos during interviews, it's for cases like this. With that said, I understand candidates frustration since most end up working on higher level network services.