Honestly its a little surprising the first optimization he found was something fairly obvious just by using perf. I thought they had discussed the zeroing buffers issue in the first post? The second optimization was definitely more involved/interesting but was still pointed at by perf. Don't underestimate that tool!
replies(2):