←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 1 comments | | HN request time: 0.207s | source
Show context
ellisd ◴[] No.45641234[source]
The paper makes no mention of Anna’s Archive. I wouldn’t be surprised if DeepSeek took advantage of Anna’s offer granting OCR researchers access to their 7.5 million (350 TB) Chinese non-fiction collection ... which is bigger than Library Genesis.

https://annas-archive.org/blog/duxiu-exclusive.html

replies(5): >>45641927 #>>45642797 #>>45642836 #>>45643509 #>>45644415 #
1. throawayonthe ◴[] No.45641927[source]
hahaha also immediately thought of this, wonder when the ocr'd dataset would be getting released