DeepSeek-OCR 2: Visual Causal Flow – arXiv最新論文の紹介

DeepSeek-OCR 2: Visual Causal Flow [15.6]
本稿では,新しいエンコーダ-ディープエンコーダV2の実現可能性を検討するためにDeepSeek-OCR 2を提案する。 DeepEncoder V2は、エンコーダに因果推論機能を持たせるように設計されており、コンテンツ解釈の前に視覚トークンをインテリジェントに並べ替えることができる。本研究は,2次元因果推論構造を用いて2次元画像理解を効果的に実現できるか否かという,新しいパラダイムを探求する。
論文参考訳（メタデータ） (Wed, 28 Jan 2026 12:46:07 GMT)
DeepEncoder V2とDeepSeek-OCR 2の提案。強力な性能を達成。特にDeepEncode V2には「DeepEncoder V2, featuring several key innovations: (1) we replace the CLIP [37] component in DeepEncoder [54] with a compact LLM [48] architecture, as illustrated in Figure 1, to achieve visual causal flow; (2) to enable parallelized processing, we introduce learnable queries [10], termed causal flow tokens, with visual tokens prepended as a prefix—through a customized attention mask, visual tokens maintain global receptive fields, while causal flow tokens can obtain visual token reordering ability; (3) we maintain equal cardinality between causal and visual tokens (with redundancy such as padding and borders) to provide sufficient capacity for re-fixation; (4) only the causal flow tokens—the latter half of the encoder outputs—are fed to the LLM [24] decoder, enabling cascade causal-aware visual understanding.」とかなりの変更がなされている。
リポジトリはGitHub – deepseek-ai/DeepSeek-OCR-2: Visual Causal Flow

コメントを残す

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル