I won’t single out Datadog on this because the exact same thing happens with cloud spend, and it’s very literally burning money.
I won’t single out Datadog on this because the exact same thing happens with cloud spend, and it’s very literally burning money.
I saw this a lot at a previous company. Being able to just "have more Lambdas scale up to handle it" got some very mediocre engineers past challenges they encountered. But it did so at the cost of wasting VAST amounts of money and saddling themselves with tech debt that completely hobbled the company's ability to scale.
It was very frustrating to be too junior to be able to change minds. Even basic things like "I know it worked for you with old on-prem NFS designs but we shouldn't be storing our data in 100kb files in S3 and firing off thousands of Lambda invocations to process workloads, we should be storing it in 100mb files and using industry leading ETL frameworks on it". They were old school guys who hadn't adjusted to best practices for object storage and modern large scale data loads (this was a 1M event per second system) and so the company never really succeeded despite thousands of customers and loads of revenue.
I consider cost consideration and profiling to be an essential skill that any engineer working in cloud style environments should have, but it's especially important that a staff engineer or person in a similar position have this skill set and be ready to grill people who come up with wasteful solutions.