Large-Scale Dimension Reduction with Both Global and Local Structure (2021) [pdf]

1. szvsw ◴[20 Nov 24 07:18 UTC] No.42191481[source]▶

Very cool paper, code in the repo looks good and easy to use as well. After a quick skim of the paper, I feel like it suffers from a pretty common flaw (one which my PI often points out to me in my work, so I guess I’m just extra attuned to it right now): the authors make a pretty convincing argument (to me at least, but I’m more of an applied ML than theoretical ML person, so grain of salt) from a mathematical/methodological perspective that PaCMAP is better than common popular DR algorithms, and has various desirable properties in terms of simultaneous global/local scale preservation etc, but they more or less accept it as a given that we need better DR algorithms and that being better than the existing methods makes the work interesting, while failing to really convincingly illustrate actual use-cases where PaCMAP unlocks some sort of insight or delivers some sort of meaningful result that t-SNE and friends could not do.

I think doing so would be especially important in a paper on DR techniques which are already so fraught in how they are deployed (often with little thought) in many applied contexts, and when so much of their putative utility comes from their interaction with human visual perception. I would have loved to see some discussion of actual engineering use cases where PaCMAP proves more useful than t-SNE - I’m sure there are many! Really just nitpicking from me though, will probably try it out on my own cases in the next few days.

replies(2): >>42191959 #>>42194344 #

2. chaosprint ◴[20 Nov 24 08:50 UTC] No.42191959[source]▶

>>42191481 (TP) #

have you tried umap https://youtu.be/sD-uDZ8zXkc?si=peosWakFIAdpyeGb

3. igorkraw ◴[20 Nov 24 14:46 UTC] No.42194344[source]▶

>>42191481 (TP) #

If you accept t sne or umap as being useful because it unveils some structure and pacmap has mathematical guarantees to preserve the same or more structure, is that not enough as a motivation?

Fwiw, I use pacmap when building pipelines to get a feel whether a model is capturing signals as expected, for which it works better than the two due to the structure preserving making the conceptual mapping easier

replies(2): >>42194419 #>>42194557 #

4. r-zip ◴[20 Nov 24 14:56 UTC] No.42194419[source]▶

>>42194344 #

What mathematical guarantees? I don't see any mathematical results in the paper.

5. szvsw ◴[20 Nov 24 15:11 UTC] No.42194557[source]▶

>>42194344 #

I agree with you that it is correct from a logical perspective; I’m talking more about a narrative perspective. Any paper benefits from having at least one or two non-toy/bon-benchmark problems IMO.