I'm very pro LLM and AI. But I completely agree with the comment about how many pieces praising LLMs are doing so with trivial examples. Trivial might not be the right word, but I can't think of a better one that doesn't have a negative connotation, but this shouldn't be negative. Your examples are good and useful, and capture a bunch of tasks a software engineer would do.
I'd say your mandelbrot debug and the LLVM patch are both "trivial" in the same sense: they're discrete, well defined, clear-success-criteria-tasks that could be assigned to any mid/senior software engineer in a relevant domain and they could chip through it in a few weeks.
Don't get me wrong, that's an insane power and capability of LLMs, I agree. But ultimately it's just doing a day job that millions of people can do sleep deprived and hungover.
Non-trivial examples are things that would take a team of different specialist skillsets months to create. One obvious potential reason why there's few non-trivial AI examples is because non-trivial AI examples require non-trivial amount of time to be able to generate and verify.
A non-trivial example isn't an example you can look at the output and say "yup, AI's done well here". It requires someone spends time going into what's been produced, assessing it, essentially redesigning it as a human to figure out all the complexity of a modern non-trivial system to confirm the AI actually did all that stuff correctly.
An in depth audit of a complex software system can take months or even years and is a thorough and tedious task for a human, and the Venn diagrams of humans who are thinking "I want to spend more time doing thorough, tedious code tasks" and "I want to mess around with AI coding" is 2 separate circles.