←back to thread

1070 points dondraper36 | 1 comments | | HN request time: 0.375s | source
Show context
hinkley ◴[] No.45068652[source]
One of the biggest, evergreen arguments I’ve had in my career revolves around the definition of “works”.

“Just because it works doesn’t mean it isn’t broken.” Is an aphorism that seems to click for people who are also handy in the physical world but many software developers think doesn’t sound right. Every handyman has at some time used a busted tool to make a repair. They know they should get a new one, and many will make an excuse to do so at the next opportunity (hardware store trip, or sale). Maybe 8 out of ten.

In software it’s probably more like 1 out of ten who will do the equivalent effort.

replies(5): >>45068811 #>>45068978 #>>45069113 #>>45069475 #>>45070924 #
mandelbrotwurst ◴[] No.45068811[source]
Those conversations are an important part of the job. You can, for example, agree that something works in the sense that it is currently possible to use it to obtain a desired output, while simultaneously failing to work in various ways: It might fail to do so reliably, or it might only be able to do so at great cost.
replies(1): >>45069108 #
1. hinkley ◴[] No.45069108[source]
It’s a frustrating argument to lose.

On a recent project I fixed our deployment and our hotfix process and it fundamentally changed the scope of epics the team would tackle. Up to that point we were violating the first principle of Continuous: if it’s painful, do it until it isn’t. So we would barely deploy more often than we were contractually (both in the legal and internal cultural sense) obligated to do, and that meant people were very conservative about refactoring code that could lead to regressions, because the turnaround time on a failing feature toggle was a fixed tempo. You could turn a toggle on to analyze the impact but then you had to wait until the next deployment to test your fixes. Excruciating with a high deviation for estimates.

With a hotfix process that actually worked worked, people would make two or three times as many iterations, to the point we had to start coordinating to keep people from tripping over each other. And as a consequence old nasty tech debt was being fixed in every epic instead of once a year. It was a profound change.

And as is often the case, as the author I saw more benefit than most. I scooped a two year two man effort to improve response time by myself in three months, making a raft of small changes instead of a giant architectural shift. About twenty percent of the things I tried got backed out because they didn’t improve speed and didn’t make the code cleaner either. I could do that because the tooling wasn’t broken.