Datadog's $65M/year customer mystery solved

(blog.pragmaticengineer.com)

Show context

ljm ◴[30 Jun 25 20:24 UTC] No.44427444[source]▶

I wonder how much that no-expense-spared, money-is-no-object attitude to buying SaaS impacts an engineers ability to make sensible decisions around infra and architecture. Coinbase might have been fine blowing 65 mil but take that approach to a new startup and you could trivially eat up a significant amount of runway with it.

I won’t single out Datadog on this because the exact same thing happens with cloud spend, and it’s very literally burning money.

replies(4): >>44427650 #>>44428240 #>>44428533 #>>44428683 #

1. viccis ◴[30 Jun 25 21:44 UTC] No.44428240[source]▶

>>44427444 #

>I wonder how much that no-expense-spared, money-is-no-object attitude to buying SaaS impacts an engineers ability to make sensible decisions around infra and architecture

I saw this a lot at a previous company. Being able to just "have more Lambdas scale up to handle it" got some very mediocre engineers past challenges they encountered. But it did so at the cost of wasting VAST amounts of money and saddling themselves with tech debt that completely hobbled the company's ability to scale.

It was very frustrating to be too junior to be able to change minds. Even basic things like "I know it worked for you with old on-prem NFS designs but we shouldn't be storing our data in 100kb files in S3 and firing off thousands of Lambda invocations to process workloads, we should be storing it in 100mb files and using industry leading ETL frameworks on it". They were old school guys who hadn't adjusted to best practices for object storage and modern large scale data loads (this was a 1M event per second system) and so the company never really succeeded despite thousands of customers and loads of revenue.

I consider cost consideration and profiling to be an essential skill that any engineer working in cloud style environments should have, but it's especially important that a staff engineer or person in a similar position have this skill set and be ready to grill people who come up with wasteful solutions.

replies(2): >>44432069 #>>44432131 #

2. nasmorn ◴[01 Jul 25 09:20 UTC] No.44432069[source]▶

>>44428240 (TP) #

It is also not a very hard skill. You do a back of the envelope calculation and if your proposed architecture is crazy expensive for your reasonable load, then you have to figure out if you are a special snowflake or just doing it wrong.

replies(1): >>44438048 #

3. happymellon ◴[01 Jul 25 09:32 UTC] No.44432131[source]▶

>>44428240 (TP) #

What's also frustrating is that a lot of times, costs are hidden from engineering.

I don't know if I would call them mediocre, but without a feedback loop its hard to get engineers to agree whether it's worth time reviewing the code to make it faster compared to just making the db one size larger.

replies(1): >>44438105 #

4. viccis ◴[01 Jul 25 21:12 UTC] No.44438048[source]▶

>>44432069 #

This is correct. It's really more of a mindset than anything. You take a guess at how much something will cost based on a quick calculation (good cloud providers make this easy, some cough Databricks cough just use a black box and bill you whatever they feel like) and then once you test it at a small scale, you verify that it's as expected and continue to monitor.

5. viccis ◴[01 Jul 25 21:20 UTC] No.44438105[source]▶

>>44432131 #

>costs are hidden from engineering

Yeah one of my big pet peeves was when engineering teams build platforms to run things on that obscure the cost. There have been times where they said "hey we made this big platform for analytics, just ship your stuff as configuration changes and it's deployed!" Then when I did it with very simple small cases, some unoptimized stuff on their end (a lot of what I talked about before) resulted in runaway costs that they, of course, tagged to my team.

Ultimately, you can only control what's in your scope and anything else you will need to hope that management can take that runaway cost feedback and make the correct team optimize it away.

>I don't know if I would call them mediocre, but without a feedback loop its hard to get engineers to agree whether it's worth time reviewing the code to make it faster compared to just making the db one size larger.

This started in the mid 2010s, by which point they should understand that you don't put terabytes of data into S3 in 100kb files. And if not, they should be willing to take some very simple steps to address it (literally just bundling them all in 100mb files with an index file containing the byte offsets of the individual ones would have solved a lot of their problems). There was a feedback loop. There just happened to be big egos more interested in their next fun project of reinventing another solution to another solved problem. I learned there that engineering driven companies sometimes wind up in situations in which the staff engineers love fun new database and infrastructure projects like that more than they enjoy improving their existing product.

replies(1): >>44438335 #

6. happymellon ◴[01 Jul 25 21:55 UTC] No.44438335{3}[source]▶

>>44438105 #

Yeah, that sounds terrible.

↑