I agree with your assessment of how stupid this is, but I'm not surprised.
To be clear, there are good reasons for this different mode. The fuck-up is not testing it properly.
These kinds of modes can be tested properly in various ways, e.g. by having an override switch that forces the chosen mode to be used all the time instead of using the default heuristics for switching between modes. And then you run your test suite in that configuration in addition to the default configuration.
The challenge is that you have now at least doubled the time it takes to run all your tests. And with this kind of project (like a compiler), there are usually multiple switches of this kind, so you very quickly get into combinatorial explosion where even a company like Google falls far short of the resources it would require to run all the tests. (Consider how many -f flags GCC has... there aren't enough physical resources to run any test suite against all combinations.)
The solution I'd love to see is stochastic testing. Instead of (or, more realistically, in addition to) a single fixed test suite that runs on every check-in and/or daily, you have an ongoing testing process that continuously tests your main branch against randomly sampled (test, config) pairs from the space of { test suite } x { configuration space }. Ideally combine it with an automatic bisector which, whenever a failure is found, goes back to an older version to see if the failure is a recent regression and identifies the regression point if so.