←back to thread

88 points ksec | 3 comments | | HN request time: 0.434s | source
Show context
tanoku ◴[] No.41916744[source]
Hi! I'm one of the authors of this feature. It's something quite novel, because it's not just a HNSW plug-in for MySQL (like e.g. pgvector is for Postgres). It's a fully transactional index, integrated into InnoDB.

We based the implementation on two very new papers from Microsoft Research, SPANN and SPFresh. SPANN is a hybrid graph/tree algorithm that does a fantastic job of scaling larger-than-RAM indexes (https://arxiv.org/abs/2111.08566) and SPFresh expands upon it by defining a set of background operations that can be performed to maintain the index's performance and recall while it's continuously updated in-place (https://arxiv.org/html/2410.14452v1). The novel thing here is designing all the SPANN _and_ SPFresh operations to be transactional, and integrating them in MySQL's default storage engine.

This tight integration fundamentally means that inserting, updating and deleting vector data from MySQL is always reflected immediately in the index as part of committing your transaction. But it also means that the indexes are fully covered by the MySQL binlog; they recover from hard crashes just fine. They're also managed by MySQL's buffer pool, so they scale to terabytes of data, just like any other table. And also crucially, they're fully integrated with the query planner, so they can be used in any query, including JOINs and WHERE clauses (something that many other vector indexes really struggle with).

We plan to release a whitepaper on our transactionality extensions to SPFresh, which I think will be super interesting, but meanwhile please feel free to test the beta and ask me any questions (here, or by emailing PlanetScale support). Thanks!

replies(3): >>41917310 #>>41919126 #>>41920038 #
1. trollied ◴[] No.41919126[source]
Since you're here - why decide to write something from scratch for mysql instead of using/improving postgres? Seems like a very strange decision. MySQL is a poor RDBMS in comparison.
replies(2): >>41919333 #>>41919833 #
2. estebarb ◴[] No.41919333[source]
PlanetScale is built on Vitess. Vitess was created in 2010 to help scaling MySQL at Youtube.
3. prng2021 ◴[] No.41919833[source]
https://www.uber.com/blog/postgres-to-mysql-migration/

https://www.mysql.com/customers/