Do the simplest thing that could possibly work

I think this article actually expresses a dangerous, risk-prone approach to problem solving, and one which ultimately causes more problems than the ones it solves.

The risk is misunderstanding the problems they are solving, and ignoring all the constraints that drove the need for some key design traits that were in place to actually solve the problem (i.e., complexity)

Take the following example from the article:

> You should do that too! Suppose you’ve got a Golang application that you want to add some kind of rate limiting to. What’s the simplest thing that could possibly work? Your first idea might be to add some kind of persistent storage (say, Redis) to track per-user request counts with a leaky-bucket algorithm. That would work! But do you need a whole new piece of infrastructure?

Let's ignore the mistake of describing Redis as persistent storage. The whole reason why rate limiting data is offloaded to a dedicated service is that you want to enforce rate limiting across all instances of an API. Thus all instances update request counts on a shared data store to account for all traffic hitting across all instances regardless of how many they might be. This data store needs to be very fast to minimize micro services tax and is ephemeral. Hence why a memory cache is often used.

And why do "per-user request counts in memory" not work? Because you enforce rate-limiting to prevent brownouts and ultimately denials of service triggered in your backing services. Each request that hits your API typically triggers additional requests to internal services such as memory stores, querying engines, etc. Your external facing instances are scaled to meet external load, but they also create load to internal services. You enforce rate-limiting to prevent unexpected high request rates to generate enough load to hit bottlenecks in internal services which can't or won't scale. If you enforce rate limits per instance, scaling horizontally will inadvertently lift your rate limits as well and thus allow for brownouts, thus defeating the whole purpose of introducing rate limiting.

Also, leaky bucket algorithms are employed to allow traffic bursts but still prevent abuse. This is a very mundane scenario that happens on pretty much all services consumed by client apps. Once an app is launched, they typically do authentication flows and fetch data required in app starts and get data, etc. After app inits the app is back to baseline request rates. If you have a system that runs more than a single API instance, requests are spread over instances by a load balancer. This means a user's request can be routed to any instance at an unspecified proportion. So how do you prevent abuse while still allowing these bursts to take place? Do you scale your services to handle peak loads 24/7 to accommodate request bursts from all your active users at any given moment? Or do you allow for momentary bursts spread across all instances, regardless of what instances they hit?

Sometimes a problem can be simple. Sometimes it can be made too simple, but you accept the occasional outage. But sometimes you can't afford frequent outages and you understand a small change, like putting up a memory cache instance, is all it takes to eliminate failure modes.

And what changed in the analysis to understand that your simple solution is no solution at all? Only your understanding of the problem domain.