←back to thread

261 points tosh | 1 comments | | HN request time: 0s | source
Show context
gwbas1c ◴[] No.42069673[source]
Classic story of a startup taking a "good enough" shortcut and then coming back later to optimize.

---

I have a similar story: Where I work, we had a cluster of VMs that were always high CPU and a bit of a problem. We had a lot of fire drills where we'd have to bump up the size of the cluster, abort in-progress operations, or some combination of both.

Because this cluster of VMs was doing batch processing that the founder believed should be CPU intense, everyone just assumed that increasing load came with increasing customer size; and that this was just an annoyance that we could get to after we made one more feature.

But, at one point the bean counters pointed out that we spent disproportionately more on cloud than a normal business did. After one round of combining different VM clusters (that really didn't need to be separate servers), I decided that I could take some time to hook up this very CPU intense cluster up to a profiler.

I thought I was going to be in for a 1-2 week project and would follow a few worms. Instead, the CPU load was because we were constantly loading an entire table, that we never deleted from, into the application's process. The table had transient data that should only last a few hours at most.

I quickly deleted almost a decade's worth of obsolete data from the table. After about 15 minutes, CPU usage for this cluster dropped to almost nothing. The next day we made the VM cluster a fraction of its size, and in the next release, we got rid of the cluster and merged the functionality into another cluster.

I also made a pull request that introduced a simple filter to the query to only load 3 days of data; and then introduced a background operation to clean out the table periodically.

replies(3): >>42070228 #>>42070679 #>>42072931 #
alsetmusic ◴[] No.42070228[source]
As much as you can say (perhaps not hard numbers, but as a percentage), what was the savings to the bottom line / cloud costs?
replies(1): >>42070472 #
gwbas1c ◴[] No.42070472[source]
Probably ~5% of cloud costs. Combined with the prior round of optimizations, it was substantial.

I was really disappointed when my wife couldn't get the night off from work when the company took everyone out to a fancy steak house.

replies(1): >>42070695 #
chgs ◴[] No.42070695[source]
So you saved the company $10k a month and got a $200 meal in gratitude? Awesome.
replies(5): >>42070984 #>>42071178 #>>42071275 #>>42071281 #>>42072406 #
1. bagels ◴[] No.42071178[source]
They're presumably already being paid a salary to do this work.