←back to thread

49 points mireklzicar | 1 comments | | HN request time: 0.272s | source
Show context
oersted ◴[] No.44350054[source]
Is an open-source library being used for this? Or can you describe the methods you use? I worked on this and related problems around extracting features from paper PDFs, we could all learn from how you did it.

Generally, an About page is always appreciated for such web tools with minimal UX, particularly when it's rather automagical.

replies(3): >>44350113 #>>44350244 #>>44350878 #
1. afandian ◴[] No.44350244[source]
In this case it's querying the relevant DOI registration agency's API for the metadata (statistically that's likely Crossref) that the publisher themselves registered. So it doesn't look like there's any extraction going on here.

Could you share _your_ work though? It's always interesting to see new approaches to metadata.

Traditionally, it was a bit of a one-way street (data comes from publisher) but there's some interesting work being done by COMET [0] and (separately) OpenAlex [1] around cleanup of the publisher-supplied data within the community.

(I used to work at Crossref; am a little involved with COMET)

[0] https://www.cometadata.org/

[1] https://openalex.org/