←back to thread

379 points Sirupsen | 3 comments | | HN request time: 0s | source
Show context
solatic ◴[] No.40928528[source]
Is it feasible to try to build this kind of approach (hot SSD cache nodes sitting in front of object storage) with prior open-source art (Lucene)? Or are the search indexes themselves also proprietary in this solution?

Having witnessed some very large Elasticsearch production deployments, being able to throw everything into S3 would be incredible. The applicability here isn't only for vector search.

replies(2): >>40928889 #>>40929482 #
1. rohitnair ◴[] No.40929482[source]
Elasticsearch and OpenSearch already support S3 backed indices. See features like https://opensearch.org/docs/latest/tuning-your-cluster/avail... The files in S3 are plain old Lucene segment files (just wrapped in OpenSearch snapshots which provide a way to track metadata around those files).
replies(1): >>40929561 #
2. francoismassot ◴[] No.40929561[source]
But you don’t have fast search on those files stored on object storage.
replies(1): >>40929618 #
3. rohitnair ◴[] No.40929618[source]
Yes, there is a cold start penalty but once the data is cached, it is equivalent to disk backed indices. There is also active work being done to improve the performance, example https://github.com/opensearch-project/OpenSearch/issues/1380...