Congrats to the Statsig team!
sister [deleted] comment said "He’s an extremely well-known and deeply respected engineer, leader, and founder in the Seattle metro region. This is a key hire for OpenAI, and a good one."
[0] https://docs.anthropic.com/en/docs/claude-code/data-usage#te...
Business-wise, I think getting acquired was the right choice. Experimentation is too small & treacherous to build a great business, and the broader Product Analytics space is also overcrowded. Amplitude (YC 2012), to date, only has a 1.4B market cap.
Joining the hottest name next door gives Statsig a lot more room to explore. I look forward to their evolution.
If Mira Murati (CTO of OpenAI) has authority over their technical decisions, then it’s an odd title. If I was talking with a CTO, I wouldn't expect another CTO to outrank or be able to overrule them.
Is Brockman now CTO over research specifically or is there going to be a weird dotted line?
In practice, these are just internal P&Ls.
https://www.amazon.com/Unaccountability-Machine-Systems-Terr...
Oh the CTO approved it so we should blame them. No not that CTO the other CTO. Oh so who decided on the final out come. The CTO! So who is on first again?
Here you will have what appears per this article to be 2 but as I looked more into it, there are 3 (!!) CTOs (see here: https://x.com/snsf/status/1962939368085327923) one of which (B2B CTO) seems to be reporting to the COO.
So in this context, you have 3 (!!) engineering organizations that don't terminate in a single engineering leader. "Apps" terminates at the "Apps" CEO (Fidji), research org terminates (??) at Sama (Overall CEO) and then B2B terminates at the COO.
So either you have weird dotted lines to Brockman for each of these CTOs OR you are going to have a lot of internal customer relationships which don't have a final point of escalation. That's definitely not common at this size and, unless these are all extremely independent organizations from a tech stack perspective (they can't really be since surely they are all reliant on the core LLMs...), then there will be a lot more weird politics that are harder to resolve than having these organizations all under one technical leader.
Of course another alternative is OAI is handing out titles for retention purposes and "CTO" will be heavily devalued as a title internally.
Big Tech teams want to ship features fast, but measuring impact is messy. It usually requires experiments and traditionally every experiment needed one Data Scientist (DS) to ensure statistical validity, i.e., "can we trust these numbers?". Ensuring validity means DS has to perform multiple repetitive but specialized tasks throughout the experiment process: debugging bad experiment setups, navigating legacy infra, generating & emailing graphs, compensating for errors and biases in post-analysis, etc. It's a slog for folks involved. Even then, cases still arise where Team A reports wonderful results & ships their feature while unknowingly tanking Team B's revenue— a situation discovered only months later when a DS is tasked to trace the cause.
Experimentation platforms like Statsig exist to lower the high cost of experimenting. To show a feature's potential impact before shipping, while reducing frustrations along the way. Most platforms will eliminate common statistical errors or issues at each stage of the experiment process, with appropriate controls for each user role. Engs setup experiments via SDK/UI with nudges and warnings for misconfigurations. DS can focus on higher-value work like metric design. PMs view shared dashboards and get automatic coordination emails with other teams if their feature is seen as breaking. People still fight but earlier on and in the same "room" with fewer questions about what's real versus what's noise.
Separating real results from random noise is the meaning of "statsig" / "statistically significant". I think it's similar to how companies define their own metrics (their sense of reality) while the platform manages the underlying statistical and data complexity. The ideal outcome is less DS needed, less crufty tooling to work around, less statistics learning, and crucially, more trust & shared oversight. But it comes at considerable, unsaid cost as well.
Is Statsig worth $1B to OpenAI? Maybe. There's an art & science to product development, and Facebook's experimentation platform was central to their science. But it could be premature. I personally think experimentation as an ideology best fits optimization spaces that previously achieved strong product-market fit ages ago. However, it's been years since I've worked in the "Experimentation" domain. I've glossed over a few key details in my answer and anyone is welcome to correct me.
It's why every mass consumer product devolves into a feed or a list of content delivered by an algorithm. Once you reach a certain point, you come full circle and even that doesn't matter anymore: users will happily consume whatever you give them, within reason.
A/B testing platforms are mostly used by an odd collection of marketers and "data driven" people who love to run experiments and drag out every little change in the name of optimization. In the end, it all completely doesn't matter and doesn't tell you anything more than just talking to an average user will.
But, boy, are they sure a great way to look busy and dress up an underperforming product!
The ONLY time I've ever seen it used successfully was to make changes to the layout of ads, making them look like more organic content or likely to be accidentally clicked. You can see this at work if you've ever clicked on an ad by mistake, or maybe you were trying to close the ad and noticed someone has "optimized" the close button placement using one of these A/B tools.
There it is, that's the market for these tools. That represents the vast majority of the these companies use-cases, revenue and usage. Anyone else who implies an innocent intent is either ignorant or inexperienced.
Experimentation works best when a) it's easy to try different parameters for a thing, and b) there is a lot of money involved.
Your comment is weird to me because you seem to be implying that experimentation is a cult because there are no things worth optimizing, and/or optimizing things has a negative societal effect. Those things are obviously true some of the time, but it's not very interesting.