←back to thread

237 points JnBrymn | 1 comments | | HN request time: 0.312s | source
1. cnxhk ◴[] No.45676635[source]
The paper is quite interesting but efficiency on OCR tasks does not mean it could be plugged into a general llm directly without performance loss. If you train a tokenizer only on OCR text you might be able to get better compression already.