←back to thread

504 points Terretta | 1 comments | | HN request time: 0.202s | source
Show context
esafak ◴[] No.45064606[source]
"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."

Let's see this harness, then, because third party reports rate it at 57.6%

https://www.vals.ai/models/grok_grok-code-fast-1

replies(2): >>45067265 #>>45069650 #
1. hrdwdmrbl ◴[] No.45067265[source]
It does still compare well against the others: https://www.vals.ai/benchmarks/swebench-2025-08-27