(metr.org)

688 points dheerajvs | 1 comments | 10 Jul 25 16:29 UTC | HN request time: 0s | source

Show context

AvAn12 ◴[10 Jul 25 18:50 UTC] No.44524235[source]▶

N = 16 developers. Is this enough to draw any meaningful conclusions?

sarchertech ◴[10 Jul 25 19:04 UTC] No.44524365[source]▶

That depends on the size of the effect you’re trying to measure. If cursor provides a 5x, 10x, or 100x productivity boost as many people are claiming, you’d expect to see that in a sample size of 16 unless there’s something seriously wrong with your sample selection.

If you are looking for a 0.1% increase in productivity, then 16 is too small.

replies(2): >>44524778 #>>44525319 #

AvAn12 ◴[10 Jul 25 20:39 UTC] No.44525319[source]▶

>>44524365 #

“A quarter of the participants saw increased performance, 3/4 saw reduced performance.” So I think any conclusions drawn on these 16 people doesn’t signify much one way or the other. Cool paper but how is this anything other than a null finding?

replies(1): >>44528972 #

1. tripletao ◴[11 Jul 25 06:33 UTC] No.44528972[source]▶

>>44525319 #

They show a 95% CI excluding zero in Figure 1. By the usual standards of social science, that's not a null finding. They give their methodology in Appendix D.

For intuition on why it's insufficient to consider N alone, I assume e.g. that you'd greatly increase your belief that a coin was unfair long before 16 consecutive heads--as already noted, the size of the effect also matters. That relationship isn't intuitive in general, and attempts to replace the math with feelings tend to fail.

↑

Measuring the impact of AI on experienced open-source developer productivity