Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation [50.0]
本研究では,Mixture-of-Recursions (MoR)を導入した。 MoRはパラメータ効率を達成するために再帰ステップをまたいだ共有レイヤのスタックを再利用し、軽量ルータは適応トークンレベルの思考を可能にする。また、KVペアを最初の再帰から再利用するKV共有変種を提案し、特にプリフィルレイテンシとメモリフットプリントの削減を図っている。
論文参考訳（メタデータ） (Mon, 14 Jul 2025 17:49:00 GMT)
「We propose Mixture-of-Recursions (MoR)—a framework that dynamically adjusts recursion step for each token during pretraining and inference. The core of MoR lies in two components: a routing mechanism that assigns token-specific recursion steps to adaptively concentrate computation on more challenging tokens, and a KV caching strategy that defines how KV pairs are stored and selectively utilized for attention at each recursive step.」という構造の提案。「MoR consistently outperforms recursive baselines and matches or exceeds the standard Transformers at larger scales, despite using significantly fewer parameters (approximately one-third due to layer tying with 𝑁𝑅= 3).」とのこと。
リポジトリはGitHub – raymin0223/mixture_of_recursions: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking

コメントを残す

コメントを残す コメントをキャンセル