Most active commenters
  • hinkley(7)

←back to thread

1070 points dondraper36 | 18 comments | | HN request time: 0.503s | source | bottom
1. hinkley ◴[] No.45068652[source]
One of the biggest, evergreen arguments I’ve had in my career revolves around the definition of “works”.

“Just because it works doesn’t mean it isn’t broken.” Is an aphorism that seems to click for people who are also handy in the physical world but many software developers think doesn’t sound right. Every handyman has at some time used a busted tool to make a repair. They know they should get a new one, and many will make an excuse to do so at the next opportunity (hardware store trip, or sale). Maybe 8 out of ten.

In software it’s probably more like 1 out of ten who will do the equivalent effort.

replies(5): >>45068811 #>>45068978 #>>45069113 #>>45069475 #>>45070924 #
2. mandelbrotwurst ◴[] No.45068811[source]
Those conversations are an important part of the job. You can, for example, agree that something works in the sense that it is currently possible to use it to obtain a desired output, while simultaneously failing to work in various ways: It might fail to do so reliably, or it might only be able to do so at great cost.
replies(1): >>45069108 #
3. fuzzy2 ◴[] No.45068978[source]
From somewhere around the net, I found this quote:

> It's not enough for a program to work – it has to work for the right reasons

I guess that’s basically the same statement, from a different angle.

replies(2): >>45069111 #>>45069174 #
4. hinkley ◴[] No.45069108[source]
It’s a frustrating argument to lose.

On a recent project I fixed our deployment and our hotfix process and it fundamentally changed the scope of epics the team would tackle. Up to that point we were violating the first principle of Continuous: if it’s painful, do it until it isn’t. So we would barely deploy more often than we were contractually (both in the legal and internal cultural sense) obligated to do, and that meant people were very conservative about refactoring code that could lead to regressions, because the turnaround time on a failing feature toggle was a fixed tempo. You could turn a toggle on to analyze the impact but then you had to wait until the next deployment to test your fixes. Excruciating with a high deviation for estimates.

With a hotfix process that actually worked worked, people would make two or three times as many iterations, to the point we had to start coordinating to keep people from tripping over each other. And as a consequence old nasty tech debt was being fixed in every epic instead of once a year. It was a profound change.

And as is often the case, as the author I saw more benefit than most. I scooped a two year two man effort to improve response time by myself in three months, making a raft of small changes instead of a giant architectural shift. About twenty percent of the things I tried got backed out because they didn’t improve speed and didn’t make the code cleaner either. I could do that because the tooling wasn’t broken.

5. soperj ◴[] No.45069111[source]
I remember once having to make a SOAP call that just wasn't connecting for some reason, but another end point on the same service was working, which made no sense. We tried calling the working endpoint right before calling the busted endpoint just for kicks, and that actually functioned. Still to this day makes no sense at all to me, we ended up moving off of soap eventually, but that code was in there until we did.
replies(1): >>45069157 #
6. vkou ◴[] No.45069113[source]
The definition of 'works' depends on whether my employer wants to spend its resources (the time I'm working) on fixing it.

If they want to use those resources to prioritize quality, I'll prioritize quality. If they don't, and they just want me to hit some metric and tick a box, I'm happy to do that too.

You get what you measure. I'm happy to give my opinion on what they should measure, but I am not the one making that call.

replies(1): >>45069396 #
7. hinkley ◴[] No.45069157{3}[source]
I hate the days when you are trying to fix a bug in a block of code and as you write pinning tests you realize that the code has always been broken and you cannot understand why it ever got the right answer. You’ve let the magic smoke out and you cannot put it back without fixing the problem. At some point you have to stop trying because you understand perfectly well how it should work and you need to move on to other things.
8. IshKebab ◴[] No.45069174[source]
I generally agree, except if the program is a one-time program meant to generate a single output and then you throw it away.

Until recently I would say such programs are extremely rare, but now AI makes this pretty easy. Want to do some complicated project-wide edit? I sometimes get AI to write me a one-off script to do it. I don't even need to read the script, just check the output and throw it away.

But I'm nitpicking, I do agree with it 99% of the time.

replies(1): >>45069611 #
9. hinkley ◴[] No.45069396[source]
They’ll never prioritize the work that keeps the wheels on. You have to learn not to ask and bake it into the cost of new feature work. It’s non negotiable or it never happens.

My second lead role, the CTO and the engineering manager thought I could walk on water and so I had considerable leeway to change things I thought needed changing.

So one of the first things I did was collectively save the team about 40 hours of code-build-test time per week. Which is really underselling it because what I actually did was both build a CI pipeline at a time nobody knew what “CI” meant, and increase the number of cycles you could reliably get through without staying late from 4 to 5 cycles per day. A >20% improvement in iterations per day and a net reduction in errors. That was the job where I learned the dangers of pushing code after 3:30pm. Everyone rationalizes that the error they saw was a glitch or someone else’s bug, and they push and then come in to find the early birds are mad at them. So better to finish what we now call deep work early and do lighter stuff once you’re tired.

Edit: those changes also facilitated us scaling the team to over twice the size of any project I’d worked on before or for some time after, though the EM deserves equal credit for that feat.

Then they fired the EM and Peter Principled by far the worst manager I’ve ever worked for (fuck you Mike, everyone hated your guts), and all he wanted to know was why I was getting fewer features implemented. Because I’m making everyone else faster. Speaking of broken, the biggest performance bottleneck in the entire app was his fault. He didn’t follow the advice I gave him back when he was working in our query system. Discovering it took hiring an Oracle DB contractor (those are always exorbitant). Fixing it after it shipped was a giant pain (as to why I didn’t catch his corner cutting, I was tagged in by another lead who was triple booked, and when I tagged back out he unfortunately didn’t follow up sufficiently on the things I prescribed).

10. Aurornis ◴[] No.45069475[source]
One of the worst periods of my career was at a company that had a team who liked to build prototypes. They would write a hasty proof-of-concept and then their boss would parade it in front of the executives. It would be deployed somewhere and connected to a little database so it technically "worked" when they tried it.

Then the executives would be stunned that it was done so quickly. The prototype team would pass it off to another team and then move on to the next prototype.

The team that took over would open the project and discover that it was really a proof of concept, not a working site. They wouldn't include basic things like security, validation, error messages, or any of the hundred things that a real working product requires before you can put it online.

So the team that now owned it would often have to restart entirely, building it within the structures used by the rest of our products. The executives would be angry because they saw it "work" with their own eyes and thought the deployment team was just complicating things.

replies(3): >>45069860 #>>45070613 #>>45076645 #
11. hinkley ◴[] No.45069611{3}[source]
I often write those sorts of tools iteratively.

By the time you’ve done something five times, it’s probably part of your actual process, and you should start treating it as normal instead of exceptional. Even if admitting so feels like a failure.

So I staple something together that works for the exact situation, then start removing the footguns I’m likely to hit, then I start shopping it to other people I see eye to eye with, fix the footguns they run into. Then we start trying to make it into an actual project, and end game is for it to be a mandatory part of our process once the late adopters start to get onboard.

12. hinkley ◴[] No.45069860[source]
The worst case of this I ran into, the “maintenance” team discovered some of the interactions were demo stubs. Nothing actually happened except the test data looked like the state transition worked.

Those are the worst because you don’t have done criteria you can reasonably write down. It’s whenever QA stops finding fakes in the code, plus a couple months for stragglers you might have missed.

13. anonymars ◴[] No.45070613[source]
Thousand-yard stare flashback: "Why is this taking months to make this work? John and James built us the first version in 2 weeks"
replies(2): >>45070893 #>>45078401 #
14. buu700 ◴[] No.45070893{3}[source]
https://youtu.be/W8ILPghbqdY
replies(1): >>45074140 #
15. Havoc ◴[] No.45070924[source]
> “Just because it works doesn’t mean it isn’t broken.”

Meanwhile all the people writing agentic LLM systems: “Hold my beer”

16. anonymars ◴[] No.45074140{4}[source]
Love it

Having the house fall on Buster was <chef's kiss>: https://youtube.com/watch?v=FN2SKWSOdGM

17. will5421 ◴[] No.45076645[source]
What would the executives have said if the deployment team had said to them "The product can stay online, but these are the risks of doing so while we work on them."?
18. hinkley ◴[] No.45078401{3}[source]
It’s been a long time since I worked with people whose previous gig was turning trumped up Excel spreadsheets into real applications two years past the Last Responsible Moment. It’s kind of up there with diabetics not going to the doctor until they’re about to lose a foot. You’ve admitted you have a problem long after we can help you thrive. Now we are just trying to minimize loss of function.