First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.
On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.
You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.