5 points tjgreen | 1 comments | | HN request time: 0.234s | source

I built pg_textsearch, a Postgres extension that brings proper BM25 ranking to full-text search. It's designed for AI/RAG workloads where search quality directly impacts LLM output.

Postgres native ts_rank lacks corpus-aware signals (no IDF, no TF saturation, no length normalization). This causes mediocre documents to rank above excellent matches, which matters when your LLM depends on retrieval quality.

Quick example:

  CREATE EXTENSION pg_textsearch;
  CREATE INDEX articles_idx ON articles USING bm25(content);
  SELECT title, content <@> to_bm25query('database performance', 'articles_idx') AS score
  FROM articles
  ORDER BY score
  LIMIT 10;
Works seamlessly with pgvector or pgvectorscale for hybrid search. Fully transactional (no sync jobs). Preview release uses in-memory architecture (64MB default per index); disk-based segments coming soon.

I love ParadeDB's pg_search but wanted something available on our managed Postgres. You can try pg_textsearch free on Tiger Cloud: https://console.cloud.timescale.com

Blog: https://www.tigerdata.com/blog/introducing-pg_textsearch-tru...

Docs: https://docs.tigerdata.com/use-timescale/latest/extensions/p...

Feedback welcome, especially from folks building RAG systems or hybrid search applications.