Embeddings are underrated (2024)

1. kaycebasques ◴[12 May 25 15:41 UTC] No.43964290[source]▶

Hello, I wrote this. Thank you for reading!

The post was previously discussed 6 months ago: https://news.ycombinator.com/item?id=42013762

To be clear, when I said "embeddings are underrated" I was only arguing that my fellow technical writers (TWs) were not paying enough attention to a very useful new tool in the TW toolbox. I know that the statement sounds silly to ML practitioners, who very much don't "underrate" embeddings.

I know that the post is light on details regarding how exactly we apply embeddings in TW. I have some projects and other blog posts in the pipeline. Short story long, embeddings are important because they can help us make progress on the 3 intractable challenges of TW: https://technicalwriting.dev/strategy/challenges.html

replies(6): >>43964625 #>>43965226 #>>43965364 #>>43965743 #>>43966818 #>>43967786 #

2. rybosome ◴[12 May 25 16:12 UTC] No.43964625[source]▶

>>43964290 (TP) #

Thanks for the write-up!

I’m curious how you found the quality of the results? This gets into evals which ML folks love, but even just with “vibes” do the results eyeball as reasonable to you?

replies(1): >>43966265 #

3. sgbeal ◴[12 May 25 17:07 UTC] No.43965226[source]▶

>>43964290 (TP) #

> I know that the post is light on details regarding how exactly we apply embeddings in TW.

More significantly, after having read the first 6 or 8 paragraphs, i still have no clue what an "embedding" is. From the 3rd paragraph:

> Here’s an overview of how you use embeddings and how they work.

But no mention of what they are (unless perhaps it's buried far deeper in the article).

replies(2): >>43966239 #>>43967398 #

4. theletterf ◴[12 May 25 17:19 UTC] No.43965364[source]▶

>>43964290 (TP) #

Perhaps you should make the post more appealing to tech writers and less to ML experts. That would help increase the reach for the intended target audience. For example, you can expand on "the ability to discover connections between texts at previously impossible scales". There's an applications section, but it's easy to overlook. Frontload value for tech writers with examples.

replies(1): >>43966329 #

5. luckydata ◴[12 May 25 17:53 UTC] No.43965743[source]▶

>>43964290 (TP) #

a small nit: while I understand this is an introductory article, I think it's a bit TOO introductory. You should at least give a preview of a "killer app" of embeddings to make me want to read the next installments, I read the entire article and I'm not sure I learned anything useful or insightful that I didn't know before. I feel you held back too much, but thanks for sharing that's appreciated.

replies(1): >>43966309 #

6. kaycebasques ◴[12 May 25 18:42 UTC] No.43966239[source]▶

>>43965226 #

I was worried that introducing the formal concept too quickly would feel a bit overwhelming for my fellow technical writers who are learning about embeddings for the first time, but I know that it's also annoying when a post makes you wait too long to get an answer to a question. So I'll find a way to provide a quick answer upfront. Thanks for the feedback.

7. kaycebasques ◴[12 May 25 18:45 UTC] No.43966265[source]▶

>>43964625 #

By results I assume that you're asking about the related pages experiment? The results were definitely promising. A lot of the calculated related pages were totally reasonable. E.g. if I'm reading https://www.sphinx-doc.org/en/master/development/html_themes... then it's very reasonable to assume that I may also be interested in https://www.sphinx-doc.org/en/master/usage/theming.html

8. kaycebasques ◴[12 May 25 18:50 UTC] No.43966309[source]▶

>>43965743 #

Yes, as a standalone post I can totally see how this is not persuasive because it's too vague and doesn't spell out specific applications. My only excuse is that I never intended this to be a standalone post; it was intended to be a conceptual primer supplemented by follow-up posts and projects exploring different applications of embeddings in technical writing. Hopefully the renewed attention on this post will motivate me to follow through on the follow-up content ;)

9. kaycebasques ◴[12 May 25 18:52 UTC] No.43966329[source]▶

>>43965364 #

Yes, definitely need to follow through on the follow-up posts and projects that show exactly how we apply embeddings to TW. Examples (in all their forms) are truly magical in how effective they are as a teaching aid.

10. kaycebasques ◴[12 May 25 19:49 UTC] No.43966818[source]▶

>>43964290 (TP) #

Also, re: direct applications of embeddings in technical writing, see https://www.tdcommons.org/dpubs_series/8057/

11. kadushka ◴[12 May 25 20:58 UTC] No.43967398[source]▶

>>43965226 #

A word embedding is a representation of a word using many numbers, where each numbers represents some property of the word. Usually we do not know what those properties are because the numbers are learned by a model during processing of a large number of texts.

12. _bramses ◴[12 May 25 21:47 UTC] No.43967786[source]▶

>>43964290 (TP) #

> Discoveryness. Even if the needed content exists, it’s hard to guarantee that users will find it.

I'm curious as to what you'll think of the UX layer I applied to embeddings for public perusal. I call it "semantic scrolling" since it's not searching exactly, but moving through the cluster by using <summary>/<details> as a tree.

[1] is a single starting point (press the animated arrow to "wiki-hole") and [2] is the entire collection (books, movies, music, animations, etc.)

[1] - https://www.sharecommonbase.com/synthesize/1009?id=1009 [2] - https://www.sharecommonbase.com/

replies(1): >>43968799 #

13. kaycebasques ◴[13 May 25 00:52 UTC] No.43968799[source]▶

>>43967786 #

Cool. I kinda grok the idea of semantic scrolling but I'm having trouble seeing it in action in the site. I think it would be useful in cases where I want to become an expert in a given topic and therefore I want to peruse lots of related ideas and create the possibility of serendipitous new neural connections. As for technical documentation, usually people want to find certain information as quickly as possible so that they can get on with their work, so I don't think semantic scrolling would be a good fit on most docs sites. I.e. they won't have the patience to semantically scroll in order to find the info they need.