←back to thread

462 points jakevoytko | 1 comments | | HN request time: 0.208s | source
Show context
GarnetFloride ◴[] No.43496036[source]
I didn’t fix this bug but I did reproduce it so it could be fixed, but it took years. At one company I worked for we have an email archive and we were seeing an uptick in customers having issues with deleting expired emails. Most companies have a retention policy of about 7 years, and the company was now 10 years old and early customers were beginning to deleted old emails. But developers couldn’t find the bug, but reducing the scope of the deletion usually worked, so it was usually marked as not reproducible. While devs tried to debug it, no one would let us poke around their prod email server every much, for obvious reasons.

I had been promoted to technical writer and I needed a better test system that didn’t have customer data for screenshots. Something I needed was unique data because the archive used single instance storage, so I put together a bash script to create and send emails generated from random lines of public domain books I got from Gutenberg.

This worked great for me and at one point I had it fire off 1 million emails just for fun. I let my test email server and archive server chew on them over the weekend. It worked great but I had nearly maxed out my storage. No problem, use the deletion function. And it didn’t work.

It’s Didn’t Work. I had reproduced the bug in-house on a system we had full control over. Engineering and QA both took copies of my environments and started working on the bug.

I also learned the lore of the deletion feature. The founding developer didn’t think anyone wanted a deletion feature because it made no sense to him. But after pressure from the CEO, Board of Directors and customers he banged out some code over a weekend and shipped it. It was no 10 years later and he was long gone, and it was finally beginning to bite us.

After devs banged no the code for a while they found there was a design flaw, it failed if the number of items to delete was more than 500. QA had tested the feature, repeatedly, but their test data set just happened to be just smaller than 500 items so the bug never triggered. I only exceeded that because Austin Powers is funny.

Now that we could reproduce it, and knew there was a design flaw. The code for deletion needed to be replaced. It needed taking over two years to replace the code, because project management never thought it was all that important compared to new features, even though customers were complaining about it.

replies(1): >>43496775 #
1. Suppafly ◴[] No.43496775[source]
Keeping stuff past retention dates is such a high liability for companies, I'm surprised they didn't sue you guys to fix it faster.