Turbopuffer: Fast search on object storage

(turbopuffer.com)

379 points Sirupsen | 2 comments | 09 Jul 24 14:48 UTC | HN request time: 0s | source

Show context

bigbones ◴[09 Jul 24 20:35 UTC] No.40920788[source]▶

>>40916786 (OP) #

Sounds like a source-unavailable version of Quickwit? https://quickwit.io/

replies(2): >>40920922 #>>40943710 #

pushrax ◴[09 Jul 24 20:48 UTC] No.40920922[source]▶

>>40920788 #

LSM tree storage engine vs time series storage engine, similar philosophy but different use cases

replies(1): >>40923436 #

singhrac ◴[10 Jul 24 03:31 UTC] No.40923436[source]▶

>>40920922 #

Maybe I misunderstood both products but I think neither Quickwit or Turbopuffer is either of those things intrinsically (though log structured messages are a good fit for Quickfit). I think Quickwit is essentially Lucene/Elasticsearch (i.e. sparse queries or BM25) and Turbopuffer does vector search (or dense queries) like say Faiss/Pinecone/Qdrant/Vectorize, both over object storage.

replies(1): >>40926621 #

1. pushrax ◴[10 Jul 24 13:24 UTC] No.40926621[source]▶

>>40923436 #

It's true that turbopuffer does vector search, though it also does BM25.

The biggest difference at a low level is that turbopuffer records have unique primary keys, and can be updated, like in a normal database. Old records that were overwritten won't be returned in searches. The LSM tree storage engine is used to achieve this. The LSM tree also enables maintenance of global indexes that can be used for efficient retrieval without any time-based filter.

Quickwit records are immutable. You can't overwrite a record (well, you can, but overwritten records will also be returned in searches). The data files it produces are organized into a time series, and if you don't pass a time-based filter it has to look at every file.

replies(1): >>40927816 #

2. singhrac ◴[10 Jul 24 15:13 UTC] No.40927816[source]▶

>>40926621 (TP) #

Ah I didn’t catch that Quickwit had immutable records. That explains the focus on log usage. Thanks!

↑