←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 5 comments | | HN request time: 0.21s | source
Show context
ellisd ◴[] No.45641234[source]
The paper makes no mention of Anna’s Archive. I wouldn’t be surprised if DeepSeek took advantage of Anna’s offer granting OCR researchers access to their 7.5 million (350 TB) Chinese non-fiction collection ... which is bigger than Library Genesis.

https://annas-archive.org/blog/duxiu-exclusive.html

replies(5): >>45641927 #>>45642797 #>>45642836 #>>45643509 #>>45644415 #
1. dev1ycan ◴[] No.45643509[source]
Oh great so now Anna's archive will get taken down as well by another trash LLM provider abusing repositories that students and researchers use, META torrenting 70TB from library genesis wasn't enough
replies(4): >>45643563 #>>45643595 #>>45643640 #>>45643646 #
2. c0balt ◴[] No.45643595[source]
It appears this is an active offer from Anna's archive, so presumably they can handle the load and are able to satisfy the request safely.
3. ◴[] No.45643640[source]
4. sigmoid10 ◴[] No.45643646[source]
Seems like they are doing fine:

https://open-slum.org

replies(1): >>45667295 #
5. dev1ycan ◴[] No.45667295[source]
Yeah, for now, Meta torrented 70TB and right after that they cut the rope for everyone else, mysteriously their hitman (US govenrment) hit both Libgen and Z-Lib shortly after.