Question for the experts: a few years back (even before covid times?) I was tasked with building a news aggregator. Universal Sentence Encoder was new, and we didn’t even have BERT back then. It felt magical (at least as a regular software dev) seeing how the cosine similarity score was heavily correlated with how similar (from a semantic standpoint) two given pieces of text were. That plus some clustering algorithm got the job done.
A few months ago I happened to play with OpenAI’s embeddings model (can’t remember which ones) and I was shocked to see that the cosine similarity of most texts was super close, even if the texts had nothing in common. It’s like the wide 0-1 range that USE (and later BERT) were giving me was compressed to perhaps a 0.2 one. Why is that? Does it mean those embeddings are not great for semantic similarity?
replies(2):