←back to thread

689 points dheerajvs | 1 comments | | HN request time: 0s | source
Show context
narush ◴[] No.44523346[source]
Hey HN, study author here. I'm a long-time HN user -- and I'll be in the comments today to answer questions/comments when possible!

If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.

[1] https://x.com/METR_Evals/status/1943360399220388093

replies(7): >>44523757 #>>44523844 #>>44523891 #>>44524187 #>>44524724 #>>44524983 #>>44528188 #
igorkraw ◴[] No.44523891[source]
Could you either release the dataset (raw but anonymized) for independent statistical évaluation or at least add the absolute times of each dev per task to the paper? I'm curious what the absolute times of each dev with/without AI was and whether the one guy with lots of Cursor experience was actually faster than the rest of just a slow typer getting a big boost out of llms

Also, cool work, very happy to see actually good evaluations instead of just vibes or observational stuies that don't account for the Hawthorne effect

replies(1): >>44524072 #
narush ◴[] No.44524072[source]
Yep, sorry, meant to post this somewhere but forgot in final-paper-polishing-sprint yesterday!

We'll be releasing anonymized data and some basic analysis code to replicate core results within the next few weeks (probably next, depending).

Our GitHub is here (http://github.com/METR/) -- or you can follow us (https://x.com/metr_evals) and we'll probably tweet about it.

replies(1): >>44525098 #
igorkraw ◴[] No.44525098[source]
Cool, thanks a lot. Btw, I have a very tiny tiny (50 to 100 audience ) podcast where we try to give context to what we call the "muck" of AI discourse (trying to ground claims into both what we would call objectively observable facts/évidence, and then _separately_ giving out own biased takes), if you would be interested to come on it and chat => contact email in my profile.
replies(1): >>44526550 #
1. ryanar ◴[] No.44526550{3}[source]
podcast link?