(github.com)

27 points Pringled | 5 comments | 17 Nov 24 13:22 UTC | HN request time: 0.874s | source

We’ve recently open-sourced Model2vec, a method to distill sentence transformers into static embeddings that outperform all previous approaches by a large margin on MTEB. Our new models set a new state-of-the-art for static embeddings. Main features:

- Our best model (potion-base-8M) has only 8M parameters, which is ~30mb on disk

- Inference is ~500x faster than the distilled base model (bge-base), on a CPU

- New models can be distilled in 30 seconds on a CPU without requiring a dataset - just a vocabulary

- Numpy-only inference: The packaged can be install the package with minimal dependencies for lightweight deployments

- The library is integrated in SentenceTransformers, making it easy to use with other popular libraries

We built this because we think static embeddings can provide a hardware friendly alternative to many of the larger embedding models out there, while still being performant enough to power usecases such as RAG, or semantic search. We are curious to hear your feedback on this and whether there’s any usecases you can think of that we have not explored yet!

Link to the code and results: https://github.com/MinishLab/model2vec

1. jerpint ◴[18 Nov 24 04:46 UTC] No.42169777[source]▶

>>42164071 (OP) #

I wonder at what point it will be ~as much overhead to pass through a subset of the data with a small yet capable and fast LLM vs. using a crude dot product when doing retrieval

replies(1): >>42170390 #

2. Pringled ◴[18 Nov 24 07:00 UTC] No.42170390[source]▶

>>42169777 #

I think a combination works quite well: first getting a small set of candidates from all the data using a lightweight model, and the using a heavy-duty model to rerank the results and get the final candidates.

3. protoshell248 ◴[18 Nov 24 08:21 UTC] No.42170706[source]▶

>>42164071 (OP) #

10K embeddings generated in under 700 milliseconds!!!

4. bturtel ◴[19 Nov 24 23:58 UTC] No.42189389[source]▶

>>42164071 (OP) #

This seems awesome for enabling RAG queries for on-device LLMs.

↑

Show HN: Model2vec – Lightning-fast Static Embeddings for RAG/Semantic Search