I expect a reviewer using AI tools to query papers to do a half decent job even if they don’t check the results… if we assume the AI hasn’t been prompt injected. They’re actually pretty good at this.
Which is to say, if there were four selections to be made from ten submissions, I expect that humans and AI reviewers to select the same winning 4 quite frequently. I agree with the outrage of the reviewers deferring their expertise to AI on grounds of dishonesty among other reasons. But I concur with the people that do it that it would mostly work most of the time in selecting the best papers of a bunch.
I do not expect there to be any positive correlation between papers that are important enough to publish and papers which embed prompt injections to pass review. If anything I would expect a negative correlation—cheating papers are probably trash.