Provable In-Context Vector Arithmetic via Retrieving Task Concepts

Provable In-Context Vector Arithmetic via Retrieving Task Concepts [53.7]
クロスエントロピー損失に対する勾配降下による非線形残差変圧器の訓練は,ベクトル演算による実-リコールICLタスクをいかに行うかを示す。これらの結果は、静的埋め込み前駆体よりもトランスフォーマーの利点を解明する。
論文参考訳（メタデータ） (Wed, 13 Aug 2025 13:54:44 GMT)
「We develop an optimization theory demonstrating that transformers with nonlinear softmax attention, MLP, layer normalization, and residual connections—trained via Gradient Descent (GD) with cross- entropy loss—can effectively perform factual-recall ICL in a vector arithmetic manner, grounded in empirically motivated data modeling. Our analysis shows that the transformer retrieves the high-level task/function concept through attention-MLP, which, when combined with any embedded query vector within the same high- level task concept, yields the correct corresponding answer vector.」とtask vectorを想定した理論的研究。
不明点はまだまだ多そうに思うが、理論的研究が進むことに期待。

コメントを残す

コメントを残す コメントをキャンセル