←back to thread

Embeddings are underrated (2024)

(technicalwriting.dev)
484 points jxmorris12 | 2 comments | | HN request time: 0.39s | source
Show context
simianwords ◴[] No.43966334[source]
I don't think any of the current consumer LLM tools use embeddings for web search. Instead they do it at the text level.

The evidence for this is the COT summary with ChatGPT - I have seen something where the the LLM uses quotes to grep on the web.

Embeddings seem good in theory but in practice its probably best to ask an LLM to do a deep search instead by giving it instructions like "use synonyms and common typos and grep".

Does any one know any live example of a consumer product using embeddings?

replies(2): >>43966401 #>>43966424 #
1. dcre ◴[] No.43966424[source]
I believe they use the LLMs to generate a set of things to search for and then run those through existing search engines, which are totally opaque and use whatever array of techniques SOTA search engines use. They are almost certainly not "grepping" the internet.
replies(1): >>43966474 #
2. simianwords ◴[] No.43966474[source]
yes that's what i meant thanks for clarifying. the grepping part is definitely done at least in spirit where the COT includes quotes. if i were searching for top 10 cars that are manufactured in South America for example, the COT might show:

"Brazil" car manufacture

This forces Brazil to be included in the keywords, at least that's how google (used to?) works.