Turbopuffer: Fast search on object storage

1. bigbones ◴[09 Jul 24 20:35 UTC] No.40920788[source]▶

>>40916786 (OP) #

Sounds like a source-unavailable version of Quickwit? https://quickwit.io/

replies(2): >>40920922 #>>40943710 #

2. pushrax ◴[09 Jul 24 20:48 UTC] No.40920922[source]▶

>>40920788 (TP) #

LSM tree storage engine vs time series storage engine, similar philosophy but different use cases

replies(1): >>40923436 #

3. singhrac ◴[10 Jul 24 03:31 UTC] No.40923436[source]▶

>>40920922 #

Maybe I misunderstood both products but I think neither Quickwit or Turbopuffer is either of those things intrinsically (though log structured messages are a good fit for Quickfit). I think Quickwit is essentially Lucene/Elasticsearch (i.e. sparse queries or BM25) and Turbopuffer does vector search (or dense queries) like say Faiss/Pinecone/Qdrant/Vectorize, both over object storage.

replies(1): >>40926621 #

4. pushrax ◴[10 Jul 24 13:24 UTC] No.40926621{3}[source]▶

>>40923436 #

It's true that turbopuffer does vector search, though it also does BM25.

The biggest difference at a low level is that turbopuffer records have unique primary keys, and can be updated, like in a normal database. Old records that were overwritten won't be returned in searches. The LSM tree storage engine is used to achieve this. The LSM tree also enables maintenance of global indexes that can be used for efficient retrieval without any time-based filter.

Quickwit records are immutable. You can't overwrite a record (well, you can, but overwritten records will also be returned in searches). The data files it produces are organized into a time series, and if you don't pass a time-based filter it has to look at every file.

replies(1): >>40927816 #

5. singhrac ◴[10 Jul 24 15:13 UTC] No.40927816{4}[source]▶

>>40926621 #

Ah I didn’t catch that Quickwit had immutable records. That explains the focus on log usage. Thanks!

6. fulmicoton ◴[12 Jul 24 08:33 UTC] No.40943710[source]▶

>>40920788 (TP) #

Quickwit is targetting logs:

    - it does not do vector search. It can rank docs using BM25, but usually people just want to sort by timestamp.
    - its does not use an SSD cache. Quickwit reads directly into the object storage.
    - it is append-only (you can't modify documents)
    - it scales really well and typically shines on the 1TB .. 100PB range
    - it has a Elastic search compatible API.