←back to thread

462 points jakevoytko | 1 comments | | HN request time: 0.208s | source
Show context
aetimmes ◴[] No.43493994[source]
(disclaimer: I know OP IRL.)

I'm seeing a lot of comments saying "only 2 days? must not have been that bad of a bug". Some thoughts here:

At my current day job, our postmortem template asks "Where did we get lucky?" In this instance, the author definitely got lucky that they were working at Google where 1) there were enough users to generate this Heisenbug consistently and 2) that they had direct access to Chrome devs.

Additionally - the author (and his team) triaged, root caused and remediated a JS compiler bug in 2 days. The sheer amount of complexity involved in trying to narrow down where in the browser code this could all be going wrong is staggering. Consider that the reason it took him "only" two days is because he is very, _very_ good at what he does.

replies(5): >>43494924 #>>43495048 #>>43495849 #>>43496185 #>>43497031 #
1. marginalia_nu ◴[] No.43496185[source]
Days-taken-to-fix is kind of a weird measure for how difficult a bug is. It's clearly a factor of a large number of things that's not the bug itself, including experience and whether you have to go it alone or if you can talk to the right people.

The bug ticks most of the boxes for a tricky bug:

* Non-deterministic

* Enormous haystack

* Unexpected "1+1=3"-type error with a cause outside of the code itself

Like sure it would have been slower to debug if it took 30 hours of to reproduce, and harder he had to be going down the Niagara falls in a barrel while debugging it, but I'm not quite sure those things quite count.

I had a similar category of bug I was struggling with the other year[1] that was related to a faulty optimization in the GraalVM JVM leading to bizarre behavior in very rare circumstances. If I'd been sitting next to the right JVM engineers over at Oracle I'm sure we'd figured it out in days and not the weeks it took me.

[1] https://www.marginalia.nu/log/a_104_dep_bug/