←back to thread

Datadog's $65M/year customer mystery solved

(blog.pragmaticengineer.com)
151 points thunderbong | 4 comments | | HN request time: 0.886s | source
1. willejs ◴[] No.44428755[source]
I have run ELK, Grafana + Prom, Grafana + Thanos/Coretex, New relic and all of the more traditional products for monitoring/observability. More recently in the last few years, I have been running full observability stacks via either The Grafana LGTM stack or datadog at a reasonable scale and complexities. Ultimately you want one tool that can alert you off a metric, present you some traces, and drill down into logs, all the way down the stack.

I have found Datadog to be, by far hands down the best developer experience from the get go, the way it glues the mostly decent products together is unparalleled in comparison to other products (Grafana cloud/LGTM). I usually say if your at a small to medium scale business just makes sense, IF you understand the product and configure it correctly which is reasonably easy. The seamless integration between tracing, logging and metrics in the platform, which you can then easily combine with alerts is great. However, its easy to misconfigure it and spend a lot of money on seemingly nothing. If you do not implement tracing and structured logs (at the right volume and level) with trace/span ids etc all the way through services its hard to see the value, and seems expensive. It requires some good knowledge, and configuration of the product to make it pay off. The rest of the product features are generally good, for example their security suite is a good entry level to cloud security monitoring and SEIM too.

However, when you get to a certain scale, the cost of APM and Infrastructure hosts in Datadog can become become somewhat prohibitive. Also, Datadogs custom metrics pricing is somewhat expensive and its query language cababilities does not quite match the power of promql, and you start to find yourself needed them to debug issues. At that point, the self hosted LGTM stack starts to make sense, however, it involves a lot more education for end users in both integration (a little less now Otel is popular) and querying/building dashboards etc, but also running it yourself. The grafana cloud platform is more attractive though.

replies(1): >>44428881 #
2. SOLAR_FIELDS ◴[] No.44428881[source]
My experience mirrors yours wrt Datadog. It's incredible value at low scale, you get a full robust system with great devex for pennies. Once you hit that tipping point though, you are locked in pretty hardcore. Datadog snakes its way far into your codebase, with all the custom tracing and stuff like that. Migrating off of it is a very expensive endeavor, which is probably one of the reasons why they are such a money printing operation.
replies(2): >>44428923 #>>44429152 #
3. willejs ◴[] No.44428923[source]
Yeah, the secret sauce of the dd libs was/is addictive for sure! I think its perhaps better now you can just use oTel for custom traces and oTel contrib libs for auto instrumentation and send that to the dd agent? I have not yet tried it because i suspected labels and other things might be named differently than the DD auto instrumentation/contrib packages, but i don't think the gap is as big now?
4. mbesto ◴[] No.44429152[source]
I think "medium scale" is probably more appropriate. For a $3M~$5M revenue SaaS you're still paying $50k+/year. That's not nothing for a small owner or PE backed SaaS company that is focused on profits/EBITDA.