←back to thread

504 points Terretta | 1 comments | | HN request time: 0s | source
Show context
esafak ◴[] No.45064606[source]
"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."

Let's see this harness, then, because third party reports rate it at 57.6%

https://www.vals.ai/models/grok_grok-code-fast-1

replies(2): >>45067265 #>>45069650 #
jiggawatts ◴[] No.45069650[source]
I know this sounds like a nitpick, but the first thing I noticed when opening the site is the use of gibberish date order where the day, month, and year parts are out of order.[1]

This doesn't just cause confusion, it's also hard to sort. To confirm my suspicion of sloppy coding, I tried to sort the date column and to my surprise I got this madness:

    1/31/2025
    2/29/2024
    2/29/2024
    4/28/2024
    3/27/2024
    9/27/2023
Which is sorting by the day column -- the bit in the middle -- instead of the year!

That's just... special.

[1] I hear some incredibly backwards places like Liberia that also haven't adopted metric insist on using it into the present day, but the rest of the civilised world has moved on.

replies(2): >>45069807 #>>45075053 #
whimsicalism ◴[] No.45069807[source]
not sure if the comment about liberia is tongue in cheek but this is by far the most common way of writing dates in the US
replies(1): >>45069887 #
jiggawatts ◴[] No.45069887[source]
Yes, of course this is tongue in cheek, but it’s the “ha-ha… but serious” type of humour.

Just look at this map: https://en.m.wikipedia.org/wiki/List_of_date_formats_by_coun...

You’re almost entirely alone in these backwards practices!

Well, not entirely alone, you also have Liberia following your “standards”! There’s two of you! Must be nice.

PS: If Trump actually wanted to make US exports competitive on the world market, step one would be to adopt world standards like metric.

replies(3): >>45069964 #>>45070144 #>>45076687 #
1. whimsicalism ◴[] No.45069964{3}[source]
at least we are not on of those poor countries that uses both MDY and DMY