Embeddings are underrated (2024)

(technicalwriting.dev)

484 points jxmorris12 | 2 comments | 12 May 25 15:05 UTC | HN request time: 0.39s | source

Show context

simianwords ◴[12 May 25 18:52 UTC] No.43966334[source]▶

I don't think any of the current consumer LLM tools use embeddings for web search. Instead they do it at the text level.

The evidence for this is the COT summary with ChatGPT - I have seen something where the the LLM uses quotes to grep on the web.

Embeddings seem good in theory but in practice its probably best to ask an LLM to do a deep search instead by giving it instructions like "use synonyms and common typos and grep".

Does any one know any live example of a consumer product using embeddings?

replies(2): >>43966401 #>>43966424 #

1. dcre ◴[12 May 25 19:01 UTC] No.43966424[source]▶

>>43966334 #

I believe they use the LLMs to generate a set of things to search for and then run those through existing search engines, which are totally opaque and use whatever array of techniques SOTA search engines use. They are almost certainly not "grepping" the internet.

replies(1): >>43966474 #

2. simianwords ◴[12 May 25 19:05 UTC] No.43966474[source]▶

>>43966424 (TP) #

yes that's what i meant thanks for clarifying. the grepping part is definitely done at least in spirit where the COT includes quotes. if i were searching for top 10 cars that are manufactured in South America for example, the COT might show:

"Brazil" car manufacture

This forces Brazil to be included in the keywords, at least that's how google (used to?) works.

↑