←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 2 comments | | HN request time: 0.001s | source
1. loaderchips ◴[] No.45643027[source]
Great work guys, how about we replace the global encoder with a Mamba (state-space) vision backbone to eliminate the O(n²) attention bottleneck, enabling linear-complexity encoding of high-resolution documents. Pair this with a non-autoregressive (Non-AR) decoder—such as Mask-Predict or iterative refinement—that generates all output tokens in parallel instead of sequentially. Together, this creates a fully parallelizable vision-to-text pipeline, The combination addresses both major bottlenecks in DeepSeek-OCR.
replies(1): >>45643102 #
2. loaderchips ◴[] No.45643102[source]
not sure why i m getting downvoted. Would love to have a technical discussion on the validity of my suggestions.