Can any program be broken down into functions and functions of functions that have inputs and outputs so that they can be verified if they are working?
Can any program be broken down into functions and functions of functions that have inputs and outputs so that they can be verified if they are working?
The hard bit is knowing which functions to write, and what "valid" means for inputs and outputs. Sometimes you'll get a specification that tells you this, but the moment you try to implement it you'll find out that whoever was writing that spec didn't really think it through to its conclusion. There will be a host of edge cases that probably don't matter, and will probably never be hit in the real world anyway, but someone needs to make that call and decide what to do when (not if) they get hit anyway.
If a program is built with strong software architecture, then a lot of it will fit that definition. As an analogy, electricity in your home is delivered by electrical outlets that are standardized -- you can have high confidence that when you buy a new electrical appliance, it can plug into those outlets and work. But someone had to design that standard and apply it universally to the outlets and the appliances. Software architecture within a program is about creating those standards on how things work and applying them universally. If you do this well, then yes, you can have a lot of code that is testable and verifiable.
But you'll always have side-effects. Programs do things -- they create files, they open network connections, they communicate with other programs, they display things on the screen. Some of those side-effects create "state" -- once a file is created, it's still present. These things are much harder to test because they're not just a function with an input and an output -- their behavior changes between the first run and the second run.
Even if you can formally verify individual methods, what you're actually looking for is if we can verify systems. Because systems, even ones made of of pieces that are individually understood, have interactions and emergent behaviors which are not expected.
Long story: yes, but it'd take centuries to verify all possible inputs, at least for any non-trivial programs.
Can a function be "verified", this can mean "tested", "reviewed", "proved to be correct". What does correct even mean?
Functions in code are often more complex than just having input and output, unlike mathematical functions. Very often, they have side effects, like sending packets on network or modifying things in their environment. This makes it difficult to understand what they do in isolation.
Any non-trivial piece of software is almost impossible to fully understand or test. These things work empirically and require constant maintenance and tweaking.
Outside of functional code, there's a lot out there which requires mutable state. This is much harder to test which is why user interface testing on native apps is always more painful and most people still run manual QA or use an entirely different testing approach.
...then a bit flips because of a stray high energy particle or someone trips over the metaphorical power cord and it all crashes anyway.
When adding machines and calculators appeared in offices, detractors claimed they would weaken the mind. In the mid-20th century, some educators derided calculator users as “button pushers” rather than “real mathematicians.”
In the 1980s, early adopters of personal computers and word processors were sometimes called “typists with toys.” Secretaries who mastered word processors were sometimes derided as “not real secretaries” because they lacked shorthand or dictation skills.
Architects and engineers who switched from drafting tables to CAD in the 1970s–80s faced accusations that CAD work was “cookie-cutter” and lacked craftsmanship. Traditional draftsmen argued that “real” design required hand drawing, while CAD users were seen as letting the machine think for them.
Across history, the insults usually follow the same structure:
- Suggesting the new tool makes the work too easy, therefore less valuable.
- Positioning users as “operators” rather than “thinkers.”
- Romanticizing the older skill as more “authentic” or “serious.”
That variable is undefined in multiple constructors. Also your code cannot compile in various scenarios. Have a nice day.
So it’s really not that far fetched.
So this doesn't seem relevant to the conversation about LLMs, Rust, and software quality improvement methods from strict typing to formal verification. It's like a "gotcha!" that didn't land. Sorry.
Please, find some bugs in a project I've touched in the last few years! Looking for things to fix. Please open a github issue from an account linked to your projects next time so I can return the favor :D The bugs are in there, I know they are, but with an LLM and a bit of time to review it's work, it now costs me a few minutes tops to write a test to exclude future cases of the same problem or class of problems. Which is a level of verification beyond what's been allocated time in previous projects, where the tests never got written or very infrequently.
Repsnapper's a great example of that. We didn't have a standardized testing framework we were using across the dozen or so libraries we'd integrated into the app. The original author Kulitorum sort of copied and pasted bits and pieces of code together to write the app originally, without much concern for licenses or origin tracking, so I initiated a line-by-line audit and license verification for the codebase in order to qualify it for inclusion in Fedora and Debian to make 3D printing easier and more available as there were no printing tools shipping in a distro at the time. Integrating new libraries into that codebase was unpleasant, working with the build system in general was not my favorite. Lots of room to screw it up, but it has to be just right to work. I think having llms and a testing framework would have allowed us to evolve the Repsnapper code a lot more safely, and a lot further than we ever managed.
Well, and I can say that pretty safely now that I'm in the process of doing just that with https://github.com/timschmidt/alumina-firmware and https://github.com/timschmidt/alumina-ui and https://github.com/timschmidt/csgrs
They're all still very young, still some things stubbed out, code is gross pending revision and cleanup, but it's basically Repsnapper 3.0 in Rust but this version includes CAD and a motion control firmware and fits in < 4mb. Among them I already have hundreds of tests entirely absent from Repsnapper. Couldn't have written csgrs without them. Probably a lot of redundant tests at this point, as things have changed rapidly. I'm only about 6mo of effort in.
That's a claim you're making for the first time, here. Not one I've made. Go ahead and re-read my comments to verify.