Embeddings are underrated (2024)

> Because we always get back the same amount of numbers no matter how big or small the input text, we now have a way to mathematically compare any two pieces of arbitrary text to each other.

I think there needs to be some more clarification here. Hash functions also return the same sized output no matter how big or small the input text. However, mathematically comparing two hashes is going to have a much different meaning than mathematically comparing two embeddings.

I'd recommend emphasizing that embeddings are training dependent--the quality of comparison will depend on the quality and type of training used to produce the embedding. There isn't some single "universal embedding" that allows for meaningful comparison of arbitrary text.