←back to thread

Embeddings are underrated (2024)

(technicalwriting.dev)
484 points jxmorris12 | 2 comments | | HN request time: 1.041s | source
Show context
jacobr1 ◴[] No.43964219[source]
I may have missed it ... but were any direct applications to tech writers discussed in this article? Embeddings are fascinating and very important for things like LLMs or semantic search, but the author seems to imply more direct utility.
replies(4): >>43964349 #>>43964388 #>>43964584 #>>43964664 #
1. sansseriff ◴[] No.43964664[source]
It would be great to semantically search through literature with embeddings. At least one person I know if is trying to generate a vector database of all arxiv papers.

The big problem I see is attribution and citations. An embedding is just a vector. It doesn't contain any citation back to the source material or modification date or certificate of authenticity. So when using embeddings in RAG, they only serve to link back to a particular page of source material.

Using embeddings as links doesn't dramatically change the way citation and attribution are handled in technical writing. You still end up citing a whole paper or a page of a paper.

I think GraphRAG [1] is a more useful thing to build on for technical literature. There's ways to use graphs to cite a particular concept of a particular page of an academic paper. And for the 'citations' to act as bidirectional links between new and old scientific discourse. But I digress

[1] https://microsoft.github.io/graphrag/

replies(1): >>43969021 #
2. kaycebasques ◴[] No.43969021[source]
IMO, for technical writing, citing a page or section within a page is usually good enough. I rarely need to cite a particular concept. But I've never even thought of the possibility of more granular concept-level citations and will definitely be pondering it more!