←back to thread

688 points dheerajvs | 2 comments | | HN request time: 0s | source
Show context
narush ◴[] No.44523346[source]
Hey HN, study author here. I'm a long-time HN user -- and I'll be in the comments today to answer questions/comments when possible!

If you're short on time, I'd recommend just reading the linked blogpost or the announcement thread here [1], rather than the full paper.

[1] https://x.com/METR_Evals/status/1943360399220388093

replies(7): >>44523757 #>>44523844 #>>44523891 #>>44524187 #>>44524724 #>>44524983 #>>44528188 #
1. yawnxyz ◴[] No.44528188[source]
Does this reproduce for early/mid-career engineers who aren't at the top of their game?
replies(1): >>44528549 #
2. narush ◴[] No.44528549[source]
How these results transfer to other settings is an excellent question. Previous literature would suggest speedup -- but I'd be excited to run a very similar methodology in those settings. It's already challenging as models + tools have changed!