(turbopuffer.com)

379 points Sirupsen | 3 comments | 09 Jul 24 14:48 UTC | HN request time: 0s | source

Show context

solatic ◴[10 Jul 24 16:19 UTC] No.40928528[source]▶

Is it feasible to try to build this kind of approach (hot SSD cache nodes sitting in front of object storage) with prior open-source art (Lucene)? Or are the search indexes themselves also proprietary in this solution?

Having witnessed some very large Elasticsearch production deployments, being able to throw everything into S3 would be incredible. The applicability here isn't only for vector search.

replies(2): >>40928889 #>>40929482 #

1. rohitnair ◴[10 Jul 24 17:50 UTC] No.40929482[source]▶

>>40928528 #

Elasticsearch and OpenSearch already support S3 backed indices. See features like https://opensearch.org/docs/latest/tuning-your-cluster/avail... The files in S3 are plain old Lucene segment files (just wrapped in OpenSearch snapshots which provide a way to track metadata around those files).

replies(1): >>40929561 #

2. francoismassot ◴[10 Jul 24 17:56 UTC] No.40929561[source]▶

>>40929482 (TP) #

But you don’t have fast search on those files stored on object storage.

replies(1): >>40929618 #

3. rohitnair ◴[10 Jul 24 18:01 UTC] No.40929618[source]▶

>>40929561 #

Yes, there is a cold start penalty but once the data is cached, it is equivalent to disk backed indices. There is also active work being done to improve the performance, example https://github.com/opensearch-project/OpenSearch/issues/1380...

↑

Turbopuffer: Fast search on object storage