If o3 can design it, that means it’s using open source schedulers as reference. Did you think about opening up a few open source projects to see how they were doing things in those two weeks you were designing?
AI research has a thing called "the bitter lesson" - which is that the only thing that works is search and learning. Domain-specific knowledge inserted by the researcher tends to look good in benchmarks but compromise the performance of the system[0].
The bitter-er lesson is that this also applies to humans. The reason why humans still outperform AI on lots of intelligence tasks is because humans are doing lots and lots of search and learning, repeatedly, across billions of people. And have been doing so for thousands of years. The only uses of AI that benefit humans are ones that allow you to do more search or more learning.
The human equivalent of "inserting domain-specific knowledge into an AI system" is cultural knowledge, cliches, cargo-cult science, and cheating. Copying other people's work only helps you, long-term, if you're able to build off of that into something new; and lots of discoveries have come about from someone just taking a second look at what had been considered to be generally "known". If you are just "taking shortcuts", then you learn nothing.
[0] I would also argue that the current LLM training regime is still domain-specific knowledge, we've just widened the domain to "the entire Internet".
So I find your assessment pretty accurate, if only depressing.