Embeddings are underrated (2024)

(technicalwriting.dev)

484 points jxmorris12 | 2 comments | 12 May 25 15:05 UTC | HN request time: 0s | source

Show context

jasonjmcghee ◴[12 May 25 16:36 UTC] No.43964913[source]▶

Another very cool attribute of embeddings and embedding search is that they are resource cheap enough that you can perform them client side.

ONNX models can be loaded and executed with transformer.js https://github.com/huggingface/transformers.js/

You can even build and statically host indices like hnsw for embeddings.

I put together a little open source demo for this here https://jasonjmcghee.github.io/portable-hnsw/ (it's a prototype / hacked together approximation of hnsw, but you could implement the real thing)

Long story short, represent indices as queryable parquet files and use duckdb to query them.

Depending on how you host, it's either free or nearly free. I used Github Pages so it's free. R2 with cloudflare would only cost the size what you store (very cheap- no egress fees).

replies(3): >>43965038 #>>43966350 #>>43966793 #

qq99 ◴[12 May 25 16:50 UTC] No.43965038[source]▶

>>43964913 #

I was wondering about this. I was hesitant to add embedding-based search to my app because I didn't want to incur the latency to the embedding API provider blocking every search on initial render. Granted, you can cache the embeddings for common searches. OTOH, I also don't want to render something without them, perform the embedding async, and then have to reify the results list once the embedding arrives. Seems hard to sensibly do that from a UX perspective.

To render locally, you need access to the model right? I just wonder how good those embeddings will be compared to those from OpenAI/Google/etc in terms of semantic search. I do like the free/instant aspect though

replies(1): >>43965093 #

jasonjmcghee ◴[12 May 25 16:55 UTC] No.43965093[source]▶

>>43965038 #

checkout MTEB (https://huggingface.co/spaces/mteb/leaderboard) many of the open source ones are actually _better_.

I've had a particularly good experiences with nomic, bge, gte, and all-MiniLM-L6-v2. All are hundreds of MB (except all-minilm which is like 87MB)

replies(1): >>43965420 #

simonw ◴[12 May 25 17:26 UTC] No.43965420{3}[source]▶

>>43965093 #

I love all-MiniLM-L6-v2 - 87MB is tiny enough that you could just load it into RAM in a web application process on a small VM. From my experiments with it the results are Good Enough for a lot of purposes. https://simonwillison.net/2023/Sep/4/llm-embeddings/#embeddi...

replies(2): >>43968841 #>>43970605 #

1. kaycebasques ◴[13 May 25 01:02 UTC] No.43968841{4}[source]▶

>>43965420 #

87MB is still quite big, though. Think of all the comments here on HN where people were appalled at a certain site loading 10-50 MB of images. Hopefully browser vendors will figure out a secure way to download a model once and re-use that single model on any website that requests it. Rather than potentially downloading a separate instance of all-MiniLM-L6-v2 for each site. I know that Chrome has an AI initiative but I didn't see any docs about this particular problem: https://developer.chrome.com/docs/ai

replies(1): >>43969035 #

2. jasonjmcghee ◴[13 May 25 01:47 UTC] No.43969035[source]▶

>>43968841 (TP) #

It's crazy because chrome ships an embedding model, it's just not accessible to users / developers (afaik)

https://dejan.ai/blog/chromes-new-embedding-model/

↑