Most active commenters
  • kaycebasques(7)

←back to thread

Embeddings are underrated (2024)

(technicalwriting.dev)
484 points jxmorris12 | 13 comments | | HN request time: 0.766s | source | bottom
1. kaycebasques ◴[] No.43964290[source]
Hello, I wrote this. Thank you for reading!

The post was previously discussed 6 months ago: https://news.ycombinator.com/item?id=42013762

To be clear, when I said "embeddings are underrated" I was only arguing that my fellow technical writers (TWs) were not paying enough attention to a very useful new tool in the TW toolbox. I know that the statement sounds silly to ML practitioners, who very much don't "underrate" embeddings.

I know that the post is light on details regarding how exactly we apply embeddings in TW. I have some projects and other blog posts in the pipeline. Short story long, embeddings are important because they can help us make progress on the 3 intractable challenges of TW: https://technicalwriting.dev/strategy/challenges.html

replies(6): >>43964625 #>>43965226 #>>43965364 #>>43965743 #>>43966818 #>>43967786 #
2. rybosome ◴[] No.43964625[source]
Thanks for the write-up!

I’m curious how you found the quality of the results? This gets into evals which ML folks love, but even just with “vibes” do the results eyeball as reasonable to you?

replies(1): >>43966265 #
3. sgbeal ◴[] No.43965226[source]
> I know that the post is light on details regarding how exactly we apply embeddings in TW.

More significantly, after having read the first 6 or 8 paragraphs, i still have no clue what an "embedding" is. From the 3rd paragraph:

> Here’s an overview of how you use embeddings and how they work.

But no mention of what they are (unless perhaps it's buried far deeper in the article).

replies(2): >>43966239 #>>43967398 #
4. theletterf ◴[] No.43965364[source]
Perhaps you should make the post more appealing to tech writers and less to ML experts. That would help increase the reach for the intended target audience. For example, you can expand on "the ability to discover connections between texts at previously impossible scales". There's an applications section, but it's easy to overlook. Frontload value for tech writers with examples.
replies(1): >>43966329 #
5. luckydata ◴[] No.43965743[source]
a small nit: while I understand this is an introductory article, I think it's a bit TOO introductory. You should at least give a preview of a "killer app" of embeddings to make me want to read the next installments, I read the entire article and I'm not sure I learned anything useful or insightful that I didn't know before. I feel you held back too much, but thanks for sharing that's appreciated.
replies(1): >>43966309 #
6. kaycebasques ◴[] No.43966239[source]
I was worried that introducing the formal concept too quickly would feel a bit overwhelming for my fellow technical writers who are learning about embeddings for the first time, but I know that it's also annoying when a post makes you wait too long to get an answer to a question. So I'll find a way to provide a quick answer upfront. Thanks for the feedback.
7. kaycebasques ◴[] No.43966265[source]
By results I assume that you're asking about the related pages experiment? The results were definitely promising. A lot of the calculated related pages were totally reasonable. E.g. if I'm reading https://www.sphinx-doc.org/en/master/development/html_themes... then it's very reasonable to assume that I may also be interested in https://www.sphinx-doc.org/en/master/usage/theming.html
8. kaycebasques ◴[] No.43966309[source]
Yes, as a standalone post I can totally see how this is not persuasive because it's too vague and doesn't spell out specific applications. My only excuse is that I never intended this to be a standalone post; it was intended to be a conceptual primer supplemented by follow-up posts and projects exploring different applications of embeddings in technical writing. Hopefully the renewed attention on this post will motivate me to follow through on the follow-up content ;)
9. kaycebasques ◴[] No.43966329[source]
Yes, definitely need to follow through on the follow-up posts and projects that show exactly how we apply embeddings to TW. Examples (in all their forms) are truly magical in how effective they are as a teaching aid.
10. kaycebasques ◴[] No.43966818[source]
Also, re: direct applications of embeddings in technical writing, see https://www.tdcommons.org/dpubs_series/8057/
11. kadushka ◴[] No.43967398[source]
A word embedding is a representation of a word using many numbers, where each numbers represents some property of the word. Usually we do not know what those properties are because the numbers are learned by a model during processing of a large number of texts.
12. _bramses ◴[] No.43967786[source]
> Discoveryness. Even if the needed content exists, it’s hard to guarantee that users will find it.

I'm curious as to what you'll think of the UX layer I applied to embeddings for public perusal. I call it "semantic scrolling" since it's not searching exactly, but moving through the cluster by using <summary>/<details> as a tree.

[1] is a single starting point (press the animated arrow to "wiki-hole") and [2] is the entire collection (books, movies, music, animations, etc.)

[1] - https://www.sharecommonbase.com/synthesize/1009?id=1009 [2] - https://www.sharecommonbase.com/

replies(1): >>43968799 #
13. kaycebasques ◴[] No.43968799[source]
Cool. I kinda grok the idea of semantic scrolling but I'm having trouble seeing it in action in the site. I think it would be useful in cases where I want to become an expert in a given topic and therefore I want to peruse lots of related ideas and create the possibility of serendipitous new neural connections. As for technical documentation, usually people want to find certain information as quickly as possible so that they can get on with their work, so I don't think semantic scrolling would be a good fit on most docs sites. I.e. they won't have the patience to semantically scroll in order to find the info they need.