(metr.org)

688 points dheerajvs | 2 comments | 10 Jul 25 16:29 UTC | HN request time: 0s | source

Show context

narush ◴[10 Jul 25 17:28 UTC] No.44523346[source]▶

Hey HN, study author here. I'm a long-time HN user -- and I'll be in the comments today to answer questions/comments when possible!

If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.

[1] https://x.com/METR_Evals/status/1943360399220388093

replies(7): >>44523757 #>>44523844 #>>44523891 #>>44524187 #>>44524724 #>>44524983 #>>44528188 #

1. yawnxyz ◴[11 Jul 25 03:39 UTC] No.44528188[source]▶

>>44523346 #

Does this reproduce for early/mid-career engineers who aren't at the top of their game?

replies(1): >>44528549 #

2. narush ◴[11 Jul 25 05:00 UTC] No.44528549[source]▶

>>44528188 (TP) #

How these results transfer to other settings is an excellent question. Previous literature would suggest speedup -- but I'd be excited to run a very similar methodology in those settings. It's already challenging as models + tools have changed!

↑

Measuring the impact of AI on experienced open-source developer productivity