←back to thread

A graph explorer of the Epstein emails

(epstein-doc-explorer-1.onrender.com)

322 points cratermoon | 3 comments | 15 Nov 25 07:27 UTC | HN request time: 0.001s | source

https://github.com/maxandrews/Epstein-doc-explorer

Show context

liotier ◴[17 Nov 25 20:07 UTC] No.45957667[source]▶

>>45935687 (OP) #

"Brad Edwards" and "Bradley Edwards" might be the same individual.

replies(5): >>45958446 #>>45958478 #>>45958562 #>>45958965 #>>45959536 #

1. cyrusradfar ◴[17 Nov 25 22:12 UTC] No.45958965[source]▶

great use case for using AI to suggest mergers and clean up.

replies(1): >>45959160 #

2. specproc ◴[17 Nov 25 22:31 UTC] No.45959160[source]▶

>>45958965 (TP) #

LLMs are awful for this. I've got a project that's doing structured extraction and half the work is deduplication.

I didn't go down the route of LLMs for the clean up, as you're getting into scale and context issues with larger datasets.

I got into semantic similarity networks for this use case. You can do efficient pairwise matching with Annoy, set a cutoff threshold, and your isolated subgraphs are merger candidates.

I wrapped up my code in a little library if you're into this sort of thing.

github.com/specialprocedures/semnet

replies(1): >>45963938 #

3. mvATM99 ◴[18 Nov 25 11:48 UTC] No.45963938[source]▶

Nice looking library! Might try it for one of my own projects.