The problem though, is the older systems had obvious problems, while the newer systems have hidden bugs and conceptual issues which often don't show up in the metrics, and which compound over time as more complexity is layered on. For example: I found an off by 1 error deep in a formula from an old launch that has been reordering top results for 15% of queries since 2015. I handed it off when I left but have no idea whether anyone actually fixed it or not.
I wrote up all of the search bugs I was aware of in an internal document called "second page navboost", so if anyone working on search at Google reads this and needs a launch go check it out.