Provable In-Context Vector Arithmetic via Retrieving Task Concepts
Provable In-Context Vector Arithmetic via Retrieving Task Concepts [53.7] クロスエントロピー損失に対する勾配降下による非線形残差変圧器の訓練は,ベクトル演算による実-リコールICLタスクをいかに行うかを示す。 これらの結果は、静的埋め込み前駆体よりもトランスフォーマーの利点を解明する。 論文参考訳(メタデータ) (Wed, 13 Aug 2025 13:54:44 GMT)
「We develop an optimization theory demonstrating that transformers with nonlinear softmax attention, MLP, layer normalization, and residual connections—trained via Gradient Descent (GD) with cross- entropy loss—can effectively perform factual-recall ICL in a vector arithmetic manner, grounded in empirically motivated data modeling. Our analysis shows that the transformer retrieves the high-level task/function concept through attention-MLP, which, when combined with any embedded query vector within the same high- level task concept, yields the correct corresponding answer vector.」とtask vectorを想定した理論的研究。