Will Amazon S3 Vectors kill vector databases or save them?

(zilliz.com)

276 points Fendy | 1 comments | 08 Sep 25 15:35 UTC | HN request time: 0.42s | source

Show context

cpursley ◴[08 Sep 25 18:19 UTC] No.45171865[source]▶

Postgres has pgvector. Postgres is where all of my data already lives. It’s all open source and runs anywhere. What am I missing with the specialty vector stores?

replies(1): >>45171919 #

CuriouslyC ◴[08 Sep 25 18:25 UTC] No.45171919[source]▶

>>45171865 #

latency, actual retrieval performance, integrated pipelines that do more than just vector search to produce better results, the list goes on.

Postgres for vector search is fine for toy products or stuff that's outside the hot loop of your business but for high performance applications it's just inadequate.

replies(1): >>45171952 #

cpursley ◴[08 Sep 25 18:27 UTC] No.45171952[source]▶

>>45171919 #

For the vast majority of applications, the trade off is worth keeping everything in Postgres vs operational overhead of some VC hype data store that won’t be around in 5 years. Most people learned this lesson with Mongo (postgrest jsonb is now good enough for 90% of scenarios).

replies(3): >>45171998 #>>45172223 #>>45172941 #

1. whakim ◴[08 Sep 25 19:41 UTC] No.45172941[source]▶

>>45171952 #

It depends on scale. If you're storing a small number of embeddings (hundreds of thousands, millions) and don't have complicated filters, then absolutely the convenience factor of pgvector will win out. Beyond that, you'll need something more powerful. I do think the dedicated vector stores serve a useful place in the market in that they're extremely "managed" - it is really really easy to just call an API and never worry about pre- or post- filtering or sharding your index across a large cluster. But they also have weaknesses in that they're usually optimized around small(er) scale where the bulk of their customers lie, and they don't really replace an actual search system like ElasticSearch.

↑