We’ve done similar work. Use case was identifying pages in an old website that now 404 and where they should be redirected to.
Basically doc2vec and cosine similarity. Totally nonsensical matching outputs to the point matching on title tag vectors or precis was better so now I’m curious if we just did something wrong…
 replies(1):