←back to thread

379 points Sirupsen | 6 comments | | HN request time: 0s | source | bottom
1. bigbones ◴[] No.40920788[source]
Sounds like a source-unavailable version of Quickwit? https://quickwit.io/
replies(2): >>40920922 #>>40943710 #
2. pushrax ◴[] No.40920922[source]
LSM tree storage engine vs time series storage engine, similar philosophy but different use cases
replies(1): >>40923436 #
3. singhrac ◴[] No.40923436[source]
Maybe I misunderstood both products but I think neither Quickwit or Turbopuffer is either of those things intrinsically (though log structured messages are a good fit for Quickfit). I think Quickwit is essentially Lucene/Elasticsearch (i.e. sparse queries or BM25) and Turbopuffer does vector search (or dense queries) like say Faiss/Pinecone/Qdrant/Vectorize, both over object storage.
replies(1): >>40926621 #
4. pushrax ◴[] No.40926621{3}[source]
It's true that turbopuffer does vector search, though it also does BM25.

The biggest difference at a low level is that turbopuffer records have unique primary keys, and can be updated, like in a normal database. Old records that were overwritten won't be returned in searches. The LSM tree storage engine is used to achieve this. The LSM tree also enables maintenance of global indexes that can be used for efficient retrieval without any time-based filter.

Quickwit records are immutable. You can't overwrite a record (well, you can, but overwritten records will also be returned in searches). The data files it produces are organized into a time series, and if you don't pass a time-based filter it has to look at every file.

replies(1): >>40927816 #
5. singhrac ◴[] No.40927816{4}[source]
Ah I didn’t catch that Quickwit had immutable records. That explains the focus on log usage. Thanks!
6. fulmicoton ◴[] No.40943710[source]
Quickwit is targetting logs:

    - it does not do vector search. It can rank docs using BM25, but usually people just want to sort by timestamp.
    - its does not use an SSD cache. Quickwit reads directly into the object storage.
    - it is append-only (you can't modify documents)
    - it scales really well and typically shines on the 1TB .. 100PB range
    - it has a Elastic search compatible API.