What is going on right now?

1. mensetmanusman ◴[22 Aug 25 13:18 UTC] No.44984308[source]▶

Not being a programmer, I have a question.

Can any program be broken down into functions and functions of functions that have inputs and outputs so that they can be verified if they are working?

replies(13): >>44984336 #>>44984354 #>>44984459 #>>44984510 #>>44984549 #>>44984555 #>>44984562 #>>44984865 #>>44984968 #>>44985022 #>>44986774 #>>44988388 #>>45001896 #

2. cyberpunk ◴[22 Aug 25 13:20 UTC] No.44984336[source]▶

>>44984308 (TP) #

Pretty much, yes. But what i think you’re talking about (formal verification of code) is a bit of a dark art and barely makes it out of very specialised stuff like warhead guidance computers and maybe some medical stuff etc.

replies(2): >>44984505 #>>44988485 #

3. ◴[22 Aug 25 13:22 UTC] No.44984354[source]▶

>>44984308 (TP) #

4. jon-wood ◴[22 Aug 25 13:31 UTC] No.44984459[source]▶

>>44984308 (TP) #

In theory, yeah. In many ways that's what test driven development is, you keep breaking down a problem into a function that you can write a unit test for, write the test, write the implementation, move on. In practice writing the functions and verifying their inputs and outputs isn't the hard bit.

The hard bit is knowing which functions to write, and what "valid" means for inputs and outputs. Sometimes you'll get a specification that tells you this, but the moment you try to implement it you'll find out that whoever was writing that spec didn't really think it through to its conclusion. There will be a host of edge cases that probably don't matter, and will probably never be hit in the real world anyway, but someone needs to make that call and decide what to do when (not if) they get hit anyway.

5. timschmidt ◴[22 Aug 25 13:34 UTC] No.44984505[source]▶

>>44984336 #

Most people don't bother with formal verification because it costs extra labor and time. LLMs address both. I've been enjoying working with an LLM on Rust projects, especially for writing tests, which aren't the same as formal verification, but it's in the same ballpark.

replies(2): >>44984837 #>>44984891 #

6. mfenniak ◴[22 Aug 25 13:34 UTC] No.44984510[source]▶

>>44984308 (TP) #

Not really.

If a program is built with strong software architecture, then a lot of it will fit that definition. As an analogy, electricity in your home is delivered by electrical outlets that are standardized -- you can have high confidence that when you buy a new electrical appliance, it can plug into those outlets and work. But someone had to design that standard and apply it universally to the outlets and the appliances. Software architecture within a program is about creating those standards on how things work and applying them universally. If you do this well, then yes, you can have a lot of code that is testable and verifiable.

But you'll always have side-effects. Programs do things -- they create files, they open network connections, they communicate with other programs, they display things on the screen. Some of those side-effects create "state" -- once a file is created, it's still present. These things are much harder to test because they're not just a function with an input and an output -- their behavior changes between the first run and the second run.

7. Arainach ◴[22 Aug 25 13:38 UTC] No.44984549[source]▶

>>44984308 (TP) #

Not without extraordinary cost that no one (save NASA, perhaps) is willing to pay.

Even if you can formally verify individual methods, what you're actually looking for is if we can verify systems. Because systems, even ones made of of pieces that are individually understood, have interactions and emergent behaviors which are not expected.

8. taco_emoji ◴[22 Aug 25 13:38 UTC] No.44984555[source]▶

>>44984308 (TP) #

Long story short: no.

Long story: yes, but it'd take centuries to verify all possible inputs, at least for any non-trivial programs.

replies(1): >>44984578 #

9. black_knight ◴[22 Aug 25 13:39 UTC] No.44984562[source]▶

>>44984308 (TP) #

No. Not every program can be broken down so. If you want that kind of certainty, this consideration needs to be part of the development process from the very beginning. This is what functional programming is all about.

10. black_knight ◴[22 Aug 25 13:40 UTC] No.44984578[source]▶

>>44984555 #

Proofs of correctness is a thing. If you prove something correct you don’t have to test every input. It just takes a big effort to design the program this way. And must be done from the beginning.

11. cryptonym ◴[22 Aug 25 14:03 UTC] No.44984837{3}[source]▶

>>44984505 #

Vibe-coding tests is nowhere near formal verification.

replies(1): >>44998848 #

12. yodsanklai ◴[22 Aug 25 14:06 UTC] No.44984865[source]▶

>>44984308 (TP) #

There are many implications to this question! TLDR; in theory yes, in practice no.

Can a function be "verified", this can mean "tested", "reviewed", "proved to be correct". What does correct even mean?

Functions in code are often more complex than just having input and output, unlike mathematical functions. Very often, they have side effects, like sending packets on network or modifying things in their environment. This makes it difficult to understand what they do in isolation.

Any non-trivial piece of software is almost impossible to fully understand or test. These things work empirically and require constant maintenance and tweaking.

13. drdrey ◴[22 Aug 25 14:08 UTC] No.44984891{3}[source]▶

>>44984505 #

not even close to being in the same ballpark

replies(1): >>44998877 #

14. pentamassiv ◴[22 Aug 25 14:15 UTC] No.44984968[source]▶

>>44984308 (TP) #

In theory you cannot even say for all programs and all inputs if the program will finish the calculation [0]. In practice you often can break it down but the number of combinations of input is what makes it impossible to test everything. Most developers try to keep individual functions as small as possible to understand them easier. You can use math to do formal verification, but that gets difficult with real programs too.

[0] https://en.wikipedia.org/wiki/Halting_problem

15. mkleczek ◴[22 Aug 25 14:19 UTC] No.44985022[source]▶

>>44984308 (TP) #

No, it is not possible, not only in practice but - more importantly - in theory as well:

https://pron.github.io/posts/correctness-and-complexity

16. Quarrelsome ◴[22 Aug 25 16:50 UTC] No.44986774[source]▶

>>44984308 (TP) #

functional code because it intentionally always takes input and returns output. However not all code is functional and testing has a side effect of making change harder. So if you write a lot of half-useless tests that break when you change anything then you've just made your code harder to change. Even with an AI doing that automatically the damage is contextual, which tests should be removed after a given change and which kept? It requires a decent amount of thought.

Outside of functional code, there's a lot out there which requires mutable state. This is much harder to test which is why user interface testing on native apps is always more painful and most people still run manual QA or use an entirely different testing approach.

17. chasd00 ◴[22 Aug 25 18:56 UTC] No.44988388[source]▶

>>44984308 (TP) #

What you're describing is basically perfect unit testing. There was even a trend called TDD ( test driven development ) that tried to make the tests the driving force behind building the software in the first place. It works but it has to be perfect and, inevitably, your tests need testing and so you're back to square one. Regardless, it's tedious and time consuming and shortcuts get taken then value of the whole thing falls apart and running the unit tests just becomes like a ritual with no real meaning/impact.

18. chasd00 ◴[22 Aug 25 19:03 UTC] No.44988485[source]▶

>>44984336 #

Also, if you're going to formally verify your code then the compiler better have been formally verified. If the compiler has been verified then the ASM better be formally verified and so on all the way down to the actual circuit and clock.

...then a bit flips because of a stray high energy particle or someone trips over the metaphorical power cord and it all crashes anyway.

replies(1): >>45010568 #

19. timschmidt ◴[23 Aug 25 20:24 UTC] No.44998848{4}[source]▶

>>44984837 #

When typewriters spread in the late 19th century, clerks who used them were sometimes called “mechanical scribblers” or accused of doing “machine work” rather than proper clerical labor.

When adding machines and calculators appeared in offices, detractors claimed they would weaken the mind. In the mid-20th century, some educators derided calculator users as “button pushers” rather than “real mathematicians.”

In the 1980s, early adopters of personal computers and word processors were sometimes called “typists with toys.” Secretaries who mastered word processors were sometimes derided as “not real secretaries” because they lacked shorthand or dictation skills.

Architects and engineers who switched from drafting tables to CAD in the 1970s–80s faced accusations that CAD work was “cookie-cutter” and lacked craftsmanship. Traditional draftsmen argued that “real” design required hand drawing, while CAD users were seen as letting the machine think for them.

Across history, the insults usually follow the same structure:

- Suggesting the new tool makes the work too easy, therefore less valuable.

- Positioning users as “operators” rather than “thinkers.”

- Romanticizing the older skill as more “authentic” or “serious.”

replies(1): >>45024319 #

20. timschmidt ◴[23 Aug 25 20:27 UTC] No.44998877{4}[source]▶

>>44984891 #

Let me know when you find a memory error or concurrency problem, or really any undefined behavior in any of my code.

replies(1): >>45001919 #

21. Disposal8433 ◴[24 Aug 25 06:36 UTC] No.45001896[source]▶

>>44984308 (TP) #

The real answer is no. You either have to use mathematical verifications with very specific tools (way too hard to use), or use languages that have this built-in (like Ada) but it hasn't been trendy for the past 40 years and people prefer dynamic languages (like JavaScript) where you can do anything without caring about quality.

22. Disposal8433 ◴[24 Aug 25 06:40 UTC] No.45001919{5}[source]▶

>>44998877 #

> https://github.com/timschmidt/repsnapper/blob/master/src/sha...

That variable is undefined in multiple constructors. Also your code cannot compile in various scenarios. Have a nice day.

replies(1): >>45068888 #

23. cyberpunk ◴[25 Aug 25 05:37 UTC] No.45010568{3}[source]▶

>>44988485 #

Actually I knew someone who worked on special compilers for embedded stuff for the military which would only emit code which uses specific asm operations which had been verified on specific cpus.

So it’s really not that far fetched.

24. cryptonym ◴[26 Aug 25 09:43 UTC] No.45024319{5}[source]▶

>>44998848 #

Few generated unit tests doesn't replace formal verification. That's just not the same thing at all, it's not a matter of manual vs automated calculator.

replies(1): >>45079128 #

25. cyberpunk ◴[29 Aug 25 20:14 UTC] No.45068888{6}[source]▶

>>45001919 #

It's been 5 days and I'm still laughing about this one. Bravo.

replies(1): >>45079081 #

26. timschmidt ◴[30 Aug 25 23:56 UTC] No.45079081{7}[source]▶

>>45068888 #

I'm laughing too, that code is 13 years old, committed by hurzl here: https://github.com/timschmidt/repsnapper/commit/5410b5d278a5... and in C++. To be clear, I'd handed off maintainership at least a decade ago. I haven't touched this software in many years. Just noticed the reply.

So this doesn't seem relevant to the conversation about LLMs, Rust, and software quality improvement methods from strict typing to formal verification. It's like a "gotcha!" that didn't land. Sorry.

Please, find some bugs in a project I've touched in the last few years! Looking for things to fix. Please open a github issue from an account linked to your projects next time so I can return the favor :D The bugs are in there, I know they are, but with an LLM and a bit of time to review it's work, it now costs me a few minutes tops to write a test to exclude future cases of the same problem or class of problems. Which is a level of verification beyond what's been allocated time in previous projects, where the tests never got written or very infrequently.

Repsnapper's a great example of that. We didn't have a standardized testing framework we were using across the dozen or so libraries we'd integrated into the app. The original author Kulitorum sort of copied and pasted bits and pieces of code together to write the app originally, without much concern for licenses or origin tracking, so I initiated a line-by-line audit and license verification for the codebase in order to qualify it for inclusion in Fedora and Debian to make 3D printing easier and more available as there were no printing tools shipping in a distro at the time. Integrating new libraries into that codebase was unpleasant, working with the build system in general was not my favorite. Lots of room to screw it up, but it has to be just right to work. I think having llms and a testing framework would have allowed us to evolve the Repsnapper code a lot more safely, and a lot further than we ever managed.

Well, and I can say that pretty safely now that I'm in the process of doing just that with https://github.com/timschmidt/alumina-firmware and https://github.com/timschmidt/alumina-ui and https://github.com/timschmidt/csgrs

They're all still very young, still some things stubbed out, code is gross pending revision and cleanup, but it's basically Repsnapper 3.0 in Rust but this version includes CAD and a motion control firmware and fits in < 4mb. Among them I already have hundreds of tests entirely absent from Repsnapper. Couldn't have written csgrs without them. Probably a lot of redundant tests at this point, as things have changed rapidly. I'm only about 6mo of effort in.

27. timschmidt ◴[31 Aug 25 00:05 UTC] No.45079128{6}[source]▶

>>45024319 #

> Few generated unit tests doesn't replace formal verification.

That's a claim you're making for the first time, here. Not one I've made. Go ahead and re-read my comments to verify.