In my experience this underrates them. They can do pretty complex tasks that go well beyond your examples if prompted correctly.
The real limiting factor is not so much task complexity as the level of abstraction and indirection. If you have code that requires following a long chain of references to understand, LLMs will struggle to work with it.
For similar reasons, they also struggle with:
- generic types
- inheritance hierarchies
- long function call chains
- dependency injection
- deeply nested structures
They're also bad at counting, which can be an issue when dealing with concurrency—i.e. you started 5 operations concurrently at different points in your program and now need to block while waiting for 5 corresponding success or failure messages. Unless your code explicitly uses the number 5 somewhere, an LLM is often going to fail at counting the operations.
All in all, the main question I think in determining how well an LLM can do a task is whether the limiting factor for your task is knowledge or abstraction. If it's knowledge (the intricacies of some arcane OS API, for example), an LLM can do very well with good prompting even on quite large and complex tasks. If it's abstraction, it's likely to fail in all kinds of seemingly obvious ways.