A graph explorer of the Epstein emails

1. liotier ◴[17 Nov 25 20:07 UTC] No.45957667[source]▶

>>45935687 (OP) #

"Brad Edwards" and "Bradley Edwards" might be the same individual.

replies(5): >>45958446 #>>45958478 #>>45958562 #>>45958965 #>>45959536 #

2. GuinansEyebrows ◴[17 Nov 25 21:19 UTC] No.45958446[source]▶

>>45957667 (TP) #

Likewise for instances of "Larry" and "Lawrence" Summers... probably a lot of those.

3. tovej ◴[17 Nov 25 21:21 UTC] No.45958478[source]▶

>>45957667 (TP) #

Yes, the dataset also has three entries for Virginia Giuffre, "Virginia L. Giuffre", "Virginia Roberts Giuffre", and "Jane Doe Number 3 (Virginia Roberts)"

4. DrewADesign ◴[17 Nov 25 21:30 UTC] No.45958562[source]▶

>>45957667 (TP) #

I’m sure some developer/archivist is working on a name authority as we speak.

5. cyrusradfar ◴[17 Nov 25 22:12 UTC] No.45958965[source]▶

>>45957667 (TP) #

great use case for using AI to suggest mergers and clean up.

replies(1): >>45959160 #

6. specproc ◴[17 Nov 25 22:31 UTC] No.45959160[source]▶

>>45958965 #

LLMs are awful for this. I've got a project that's doing structured extraction and half the work is deduplication.

I didn't go down the route of LLMs for the clean up, as you're getting into scale and context issues with larger datasets.

I got into semantic similarity networks for this use case. You can do efficient pairwise matching with Annoy, set a cutoff threshold, and your isolated subgraphs are merger candidates.

I wrapped up my code in a little library if you're into this sort of thing.

github.com/specialprocedures/semnet

replies(1): >>45963938 #

7. adolph ◴[17 Nov 25 23:13 UTC] No.45959536[source]▶

>>45957667 (TP) #

I read a recent observation that people subject to discovery are often making purposeful typos in key names in order for the communication to remain under the radar.

replies(1): >>45964472 #

8. mvATM99 ◴[18 Nov 25 11:48 UTC] No.45963938{3}[source]▶

>>45959160 #

Nice looking library! Might try it for one of my own projects.

9. potato3732842 ◴[18 Nov 25 12:14 UTC] No.45964472[source]▶

>>45959536 #

Everyone is potentially subject to discovery. Some people are just more aware of it.