Take away everything else, there’s a product that is really good at small tasks, it doesn’t mean that changing those small tasks together to make a big task should work.
I'm not sure there's much to learn here, besides it's kinda fun, since no real human was forced to suffer through this exercise on the implementor side.
Yes.
How useful is the comparison with the worst human results? Which are often due to process rather than the people involved.
You can improve processes and teach the humans. The junior will become a senior, in time. If the processes and the company are bad, what's the point of using such a context to compare human and AI outputs? The context is too random and unpredictable. Even if you find out AI or some humans are better in such a bad context, what of it? The priority would be to improve the process first for best gains.
I don't mean the code producers, I mean the enterprise itself is not intelligent yet it (the enterprise) is described as developing the software. And it behaves exactly like this, right down to deeply enjoying inflicting bad development/software metrics (aka BD/SM) on itself, inevitably resulting in:
https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
It's not hard, just different.