"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."
Let's see this harness, then, because third party reports rate it at 57.6%
replies(2):
Let's see this harness, then, because third party reports rate it at 57.6%
This doesn't just cause confusion, it's also hard to sort. To confirm my suspicion of sloppy coding, I tried to sort the date column and to my surprise I got this madness:
1/31/2025
2/29/2024
2/29/2024
4/28/2024
3/27/2024
9/27/2023
Which is sorting by the day column -- the bit in the middle -- instead of the year!That's just... special.
[1] I hear some incredibly backwards places like Liberia that also haven't adopted metric insist on using it into the present day, but the rest of the civilised world has moved on.
I'm not sure why you're particularly picking on MM/DD/YYYY, saying things like "backwards places". DD/MM/YYYY doesn't sort any better. YYYY-MM-DD is the only one that sorts well. (Some people promote YYYYY-MM-DD though, which I guess is more future proof.)