Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis / Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis [88.1]
MambaモデルはTransformerベースのモデルよりも計算上の優位性に大きく注目されている。本稿では,一層マンバモデルのトレーニング力学に関する最初の理論的解析を行った。マムバは、より多くのトレーニングを必要とするかもしれないが、線形変換器が許容できるしきい値を超える場合であっても、正確な予測を保っている。
論文参考訳（メタデータ） (Wed, 01 Oct 2025 01:25:01 GMT)
Mambaの理論的解析、「While linear Transformers may converge faster with smaller batch sizes, they can only in-context generalize effectively when the fraction of outlier-containing context examples is less than 1/2, much less than that for Mamba. Moreover, linear Transformers require significantly more context examples than Mamba to achieve comparable generalization performance. This highlights Mamba’s superior robustness to a high density of outliers in ICL.」というのは面白い特徴

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression [90.9]
Mambaは、Long-Sequence Modelingのための線形複雑性を持つ効率的なTransformer代替品である。最近の実証研究は、Mambaのテキスト内学習(ICL)がTransformersと競合していることを示している。本稿では,線形回帰 ICL タスクにおける Mamba のトレーニングダイナミクスについて検討する。
論文参考訳（メタデータ） (Sun, 28 Sep 2025 09:48:49 GMT)
「The loss bound is comparable to that of Transformer. Our theoretical results reveal the different mechanism between Transformer and Mamba on ICL, where Mamba emulates a variant of online gradient descent to perform in-context, while Transformers approximate a single step of gradient descent. Furthermore, our comparison with the S4 model demonstrates that the selection components are essential for Mamba to perform ICL.」とこちらも面白い指摘

コメントを残す

コメントを残す コメントをキャンセル