Trying it now in Vscode Insiders with Github Copilot (codex crashes with HTTP 400 server errors), and it eventually started using sed and grep in shells instead of using the better tools it has access to. I guess this is not an issue to perform well in benchmarks.
replies(3):