> Software Engineering is unusual in that it works with deterministic machines. Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
Recently some people have compared LLMs to compilers and the resulting source code to object code. This is a false analogy, because compilation is (almost always) a semantics preserving transformation. LLMs are given a natural language spec (prompt) that by definition is underspecified. And so they cannot be semantics preserving, as the semantics of their input is ambiguous.
The programmer is left with two options: (1) understanding the resulting code, repairing and rewriting it, or (2) ignoring the code and performing validation by testing.
Both of these approaches are assistive. At least in its current form, AI can only accelerate a programmer, not replace them. Lovable and similar tools rely on very informal testing, which is why they can be used by non-programmers, but they have very little chance of producing robust software of any complexity. I’ve seen people creating working web apps, but I am confident I could find plenty of strange bugs just by testing edge cases or stressing non functional qualities. The bigger issue is the bugs I can’t find because they’re not bugs a human programmer would create.
Option (1) is problematic because LLMs tend not to produce clean code designed to be human readable. A lot of the efforts coders are making is to break down tasks and try to guide the LLMs to produce good code. I have yet to see this work for anything novel and complex. For non trivial systems, reasoning and architecture are required. The hope is that a programmer can write specs well enough that LLMs can “fill in the gaps”. But whether this is a net positive once considering the work involved is still an open question. I’ve yet to see any first hand evidence that there’s a productivity gain here. It’s early days.
Option(2) is also difficult because there is a crucial factor missing in AI coding, the “generality of intent” as a human user. This is a problem, because the non-trivial bugs an AI produces are unlikely to be similar to those from a human. Those bugs are usually a failure of reasoning, but LLMs don’t reason in the same sense that humans do, so testing in the same way may not be possible. Your intuitions for where bugs lie are no longer applicable. The likely result is worse code produced more quickly, and that trade-off needs exploring.
At the moment I think AI is useful for (a) discussions around design, libraries, debugging, (b) autocomplete, (c) agent analysis of existing code where partial answers are ok and false positives acceptable (eg finding some but not all bugs). Agent coding doesn’t seem ready for production to me, not until we have much better tooling to prevent some of these problems, or AI becomes capable of proper reasoning.