←back to thread

276 points Fendy | 1 comments | | HN request time: 0.42s | source
Show context
cpursley ◴[] No.45171865[source]
Postgres has pgvector. Postgres is where all of my data already lives. It’s all open source and runs anywhere. What am I missing with the specialty vector stores?
replies(1): >>45171919 #
CuriouslyC ◴[] No.45171919[source]
latency, actual retrieval performance, integrated pipelines that do more than just vector search to produce better results, the list goes on.

Postgres for vector search is fine for toy products or stuff that's outside the hot loop of your business but for high performance applications it's just inadequate.

replies(1): >>45171952 #
cpursley ◴[] No.45171952[source]
For the vast majority of applications, the trade off is worth keeping everything in Postgres vs operational overhead of some VC hype data store that won’t be around in 5 years. Most people learned this lesson with Mongo (postgrest jsonb is now good enough for 90% of scenarios).
replies(3): >>45171998 #>>45172223 #>>45172941 #
1. whakim ◴[] No.45172941[source]
It depends on scale. If you're storing a small number of embeddings (hundreds of thousands, millions) and don't have complicated filters, then absolutely the convenience factor of pgvector will win out. Beyond that, you'll need something more powerful. I do think the dedicated vector stores serve a useful place in the market in that they're extremely "managed" - it is really really easy to just call an API and never worry about pre- or post- filtering or sharding your index across a large cluster. But they also have weaknesses in that they're usually optimized around small(er) scale where the bulk of their customers lie, and they don't really replace an actual search system like ElasticSearch.