Treating the LLM like a compiler is a much more scalable, extensible and composable mental model than treating it like a junior dev.
Treating the LLM like a compiler is a much more scalable, extensible and composable mental model than treating it like a junior dev.
A version that DID work like a compiler would be super interesting - it could replace the function body with generated Python code on your first call and then reuse that in the future, maybe even caching state on disk rather than in-memory.
I would like to try something like this in Rust: - you use a macro to stub out the body of functions (so you just write the signature) - the build step fills in the code and caches it - on failures the, the build step is allowed to change the function bodies generated by LLMs until it satisfies the test / compile steps - you can then convert the satisfying LLM-generated function bodies into a hard code (or leave it within the domain of "changeable by the llm")
It sandboxes what the LLM can actually alter, and makes the generation happen in an environment where you can check right away if it was done correctly. Being Rust, you get a lot of more verifications. And, crucially, keeps you in the driver's seat.