(Or maybe it’s validation in RAG and these companies should rejoice)
(Or maybe it’s validation in RAG and these companies should rejoice)
Quite a few of my customers build on top of Rockset and it won’t be a smooth transition.
Rockset is off boarding existing customers. Definitely sucks we spent the last 3 months adopting it. We used it to replicate dynamodb in near real time for adhoc & reporting queries. Schemaless architecture was very easy to work with
> Month-to-month customers without an active contract will have until Monday,
> September 30th, 2024, 5 PM PDT to off-board.
I'd love to hear from someone with expertise in vendor onboarding and business continuity risk: how do vendor contracts typically protect customers in situations like this?
I'm sure will be super frustrated with datastore vendor change, which would need nontrivial resources from product development to system migration in such a short span of time.
That's in the termination clause.
I can think of two options
- Pure acqui-hire: virtually all of Rockset engineering leadership is ex-Meta, and OpenAI has been hiring several senior infra engineers from Meta, so these are all people that have worked together previously.
- OpenAI is building some product where customers can ingest large amounts of data, which could be managed by the Rockset infrastructure as source of truth, and then indexed by their RAG systems.
Google and Amazon followed the same strategy for over a decade just buying anything that was possibly helpful.
If you want to talk (not secret) technical details, you know where to find me :)
-Tudor.
https://startree.ai/saas-signup
I understand Rockset-to-StarTree (Apache Pinot) is not a 1:1 drop-in replacement. But hopefully it's a port in a storm.
Whether you end up on StarTree or another suitable alternate, I hope everyone has as painless a migration as possible. Reminds me a bit of how FoundationDB customers found themselves without a home when Apple acquired them back in 2015[0].
Likewise, I don't think it's going to stem the tide of adding vector indexes and similarity search techniques to traditional databases.
Instead, if anything, I think this is a validation that traditional databases aren't going anywhere — OLAP or OLTP. Behind all the LLM models you're still going to need true, authoritative data in databases to avoid (or at least minimize) the hallucination problem.
AI needs, if anything, even more programmatic ways to get at that data.
As others will say, there are options. Rockset helpfully posts links to a bunch of comparisons on their website, and these alternatives include ClickHouse, Elasticsearch, Druid, etc.. https://rockset.com/real-time-analytics-comparison/
I'm inherently biased (as a member of the ClickHouse team). But do check ClickHouse out.
You can always come hang out in our Slack (clickhouse.com/slack) and, of course, the combination of hosted ClickHouse (clickhouse.com/cloud) and the open-source (github.com/clickhouse) may add a bit of comfort when your vendor up and disappears via acquisition.
A lot of corporate customers will seek longer term contracts, a year or even longer, so they can lock in a price and various service guarantees. Even in the case of this acquisition it's only customers on a month to month plan that have to migrate by September, customers on a long term contract will continue to have access and support for the duration of their contract.
Good parts:
It has a slick and nice-looking UI. Good documentation. Many data loading options (including S3).
SQL support is good (Calcite?). Types are inferred on data loading. But you have to choose one "timestamp" column.
Bad parts:
First data load attempts failed (after 24 hours, it showed something like "too many retries").
I've loaded around 500 million rows, and the storage limit ran out.
Query performance did not shine. Storage size was very large (it seems they create many indices automatically).
Considerations:
The technology is not open-source. It is rocksdb + secondary indices + object storage + SQL engine.
While we're putting in plugs for open source alternatives, I'll recommend looking at StarRocks. https://www.starrocks.io/
I share Peter's sentiment for wishing everyone an easy transition, whatever you choose.
The (very thin) blog post said "Enhancing our retrieval infrastructure" - my guess is this is more about other forms of retrieval, like constructing and executing SQL queries and using the results to help answer questions.
With investment from vulture capitalists to the tune of $117M. [2] I would assume they want a sizeable return on investment, so maybe a $250-350M cash deal?
Doesn’t seem like this would be a unicorn, but it’s a payoff. Certainly will cover the losses from a few bad investments.
[1] https://venturebeat.com/ai/openai-acquires-rockset-to-streng...
[2] https://www.crunchbase.com/organization/rockset/company_fina...
It's this point where my gratitude for Llama and Meta is extremely high.
Maybe they should rename it to their migration options page. Or maybe I'll just ask ChatGPT what the best alternative is...
Still, pretty useful stuff, but it also feels like Rockset had been moving a little too slowly in recent years, but congrats to them on finding a new home.
Looking at the landing page now it seems they almost pivoted into semi/unstructed data.
To your point, I feel like nobody knows exactly how to do RAG really well (fast and accurate). I also doubt the Rockset team has it figured out but it seems like there is an opportunity to build a new kind of database/memory system and OpenAI believes the Rockset team can help.
https://rockset.com/blog/openai-acquires-rockset/
Month-to-month users been given until September 30, which is a very short amount of time for a major infrastructure transition. Enterprise users are given a vague "talk to your account manager" answer:
https://docs.rockset.com/documentation/docs/faq
In other words, the above isn't just FUD from a competitor, there legitimately are going to be a lot of frantic refugees in the coming months.
Why people would build actual businesses on top of these fly at night Saas companies funded by VC money is beyond me.
OpenAI Leadership: "Ok, buy Rockset and have them build anything you need."
OpenAI Eng Mgmt: "... Ok. You want to run a db service?"
OpenAI Leadership: "No. Dump all the existing customers. They build for us now."
Technically, when companies choose a vendor, they should consider risks like a company suddenly being acquired. In practice, it’s quite hard to assign an actual number to that column - it’s almost always a risk but it’s extremely hard to quantify. You’ll often hear things like “every vendor could be acquired so that counts equally for all choices” when that risk gets discussed.
This. Not sure why RAG triggers vector search for everyone. Retrieval Augmented Generation is as generic as it can get.
What sets RisingWave apart is its focus on stream processing while maintaining SQL compatibility. This could be particularly valuable for users leveraging Rockset's real-time capabilities. RisingWave offers several features that may appeal to Rockset users. It's built to scale in cloud environments and can ingest data from a large variety of sources. The database supports materialized views for efficient query processing and ensures data consistency with ACID transactions. For those concerned about vendor lock-in after this experience, RisingWave's open-source nature (Apache 2.0 license) provides an extra layer of assurance. There's also a managed cloud offering for those who prefer a hands-off approach.
I encourage impacted Rockset users to explore RisingWave as part of their evaluation process. The project has a welcoming community(join at risingwave.com/slack) and extensive documentation to help with the transition. [Disclosure: I'm associated with RisingWave. Happy to answer any questions or provide more details about how it compares to Rockset for specific use cases.]
At least I have never seen a non-profit acquihire before.
Google (Android, Gmail, Maps, G Office), Apple (iPhone, Mail, Maps, Productivity), Microsoft (Office365, Windows, XBox).
In terms of moat and lock-in, that leaves OpenAI vulnerable to last mile customer hijacking.
E.g. https://www.ftc.gov/enforcement/premerger-notification-progr...
https://www.ftc.gov/enforcement/premerger-notification-progr...
There are like 10k+ mergers and acquisitions done in the US each year (ballpark). It requires real analysis to figure if something should be blocked (practically none have any real effect on anything and shouldn't be) and there are only so many folks at the regulators who can do that analysis (and honestly... they aren't good at it...).
It’s been very common to see startups many of whom have never set foot in an enterprise push this idea that you can drop a LLM on top of company data and ask questions like it was ChatGPT. The reality is that most company data is a mess with little funding/will to fix it and so the results are unusable. So if OpenAI wants to be anything other more than a chatbot they will need to start to tackle this problem.
Amazing to watch their aspirations go from such lofty heights to being just another enterprise data infrastructure SaaS company.
And should be a clear sign that the AI hype train has run out of stream.
Not entirely fair - see https://github.com/rockset/rocksdb-cloud (a fork of RocksDB with a separation of storage and compute, using S3 and Lambda-based compaction)
I’ve been around for a few M&A that horrendously fucked customers of the “acquired” company and the regulator doesn’t care. Even if the acquirer is under regulatory observation.
These migrations are going to be complicated as there is no 1-1 drop-in replacement. They will touch every aspect of the data lifecycle, from ingestion and transformation to serving. Query optimization will also have to be redone/rethought.
At Propel, we just announced our Rockset migration service to help customers through this process: https://www.propeldata.com/blog/rockset-migration-service
If it is helpful to anyone our company provides a data platform designed for high speed analytics at scale. It's certainly not a 1:1 alternative, but if you are looking for an all-in-one solution with a lot fewer moving parts we might be the right answer. We're happy to provide an extended free trial through September 30 for anyone looking at migration alternatives:
https://docs.minusonedb.com/#start-your-free-trial
In any case we wish everyone an easy transition to your next solution!
The short answer: they generally don't, unless you negotiate for it. I run a company and dealing with this kind of situation (or better still anticipating it) is part of my job.
The Rockset situation generally falls under "termination for convenience" where a party of the contract terminates for reasons other than cause, e.g., bankruptcy or the other party violating contract terms. Taking the Rockset TOS as an instructive example, it covers customer's termination for convenience in Section 15.2. [0] However, there's nothing about Rockset terminating for convenience.
Normally this ambiguity could cause legal problems for the vendor, but Rockset added a 'get out of jail free card' in Section 2. They can just change the ToS.
> 2. Changes to Agreement or Services. Rockset may update this Agreement at any time, in its sole discretion. If Rockset does so, it will let Customer know either by posting the updated Agreement on the Site or through other communications. If Customer continues to use the Services after Rockset has posted updated Agreement, Customer agrees to be bound by the updated Agreement. Because the Services are evolving over time, Rockset may change or discontinue all or any part of the Services, at any time and without notice, at its sole discretion.
I am not a lawyer but (a) this is an awful contract for customers and (b) all is not lost. Your best recourse is to check with a lawyer to see if what footholds you can use (probably few), then make a public stink about destroying your data. I sincerely doubt the new owners will want to deal with that and will extend support.
The next time around, get a lawyer to help and negotiate terms. Pro tip: Smaller vendors are often more flexible, but all of them negotiate if they want the deal badly enough.
More happily FoundationDB was later released as open source and is successfully used by many companies beyond Apple. Snowflake is a prominent example. [0]
[0] https://www.snowflake.com/blog/how-foundationdb-powers-snowf...
I'm worried more platforms like StarTree, SingleStore, etc will follow suit in the next 24 months.
Any thoughts or assurances on this? Thank you for your post.
I'm putting together a public/free Rockset feature comparison matrix.
I want to help educate customers (and other vendors) on what Rockset did so well and what considerations are needed to find a replacement.
https://bit.ly/rockset-feature-comparison
HN'ers I realize you are wizards, don't shoot the messenger. Just trying to help.
I would like to add CrateDB (I work there) to the list. CrateDB is a distributed SQL database purposely built for real-time analytics across large datasets of structured and semi-structured data. Similarly to Rockset, it indexes all data in real-time (text, vector, geospatial, time-series, and JSON) for the most efficient search and fast ad hoc query execution at any scale. It is built on top of Apache Lucene and unlike Rockset is open-source (https://github.com/crate/crate).
Rocket frequently comes up among other solutions our users were looking at before choosing CrateDB. For example https://cratedb.com/customers/govspend.