(technicalwriting.dev)

484 points jxmorris12 | 3 comments | 12 May 25 15:05 UTC | HN request time: 0.469s | source

Show context

minimaxir ◴[12 May 25 16:29 UTC] No.43964828[source]▶

> I don’t know. After the model has been created (trained), I’m pretty sure that generating embeddings is much less computationally intensive than generating text.

An embedding is generated after a single pass through the model, so functionally it's the equivalent of generating a single token from an text generation model.

replies(2): >>43965007 #>>43969096 #

1. energy123 ◴[12 May 25 16:46 UTC] No.43965007[source]▶

>>43964828 #

I might be wrong but aren't embedding models usually bidirectional and not causal, so the attention mechanism itself is more expensive.

replies(2): >>43965051 #>>43965189 #

2. breadislove ◴[12 May 25 16:51 UTC] No.43965051[source]▶

>>43965007 (TP) #

yes exactly

3. minimaxir ◴[12 May 25 17:04 UTC] No.43965189[source]▶

>>43965007 (TP) #

It depends on the architecture (you very well can convert a decoder-only causal model to an embeddings model, e.g. Qwen/Mistral), but it is true the traditional embeddings models such as a BERT-based one are bidirectional, although unclear how much more compute that inherently requires.

Compare to ModernBERT, which uses more modern techniques and is still bidirectional, but it is very very speedy. https://huggingface.co/blog/modernbert

↑

Embeddings are underrated (2024)