2025年10月7日 – arXiv最新論文の紹介

Mem-α: Learning Memory Construction via Reinforcement Learning

Mem-α: Learning Memory Construction via Reinforcement Learning [20.9]
大きな言語モデル(LLM)エージェントは、限られたコンテキストウィンドウによって制約される。現在のメモリ拡張エージェントは、メモリ更新のための事前に定義された命令とツールに依存している。 Mem-alphaは、エージェントに複雑なメモリシステムを効果的に管理するように訓練する強化学習フレームワークである。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 08:02:34 GMT)
システムプロンプト等で処理を行うメモリ管理エージェントでは限界があるためメモリ管理戦略を学ぶよう強化学習を活用するアプローチを提案「we propose Mem-α, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. 」
「Empirical evaluation demonstrates that Mem-α achieves significant improvements over existing memory-augmented agent baselines across diverse benchmarks. Most remarkably, despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit robust generalization to sequences exceeding 400k tokens, over 13× the training length.」というのも興味深い。
リポジトリはGitHub – wangyu-ustc/Mem-alpha: Learning Memory Construction via Reinforcement Learning

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub [6.7]
大規模言語モデル(LLM)は、ソフトウェア開発プロセスに統合されつつある。自律的なAIエージェントを使用して、コードを生成し、人間の介入を最小限に抑えたプルリクエストを提出する能力は、標準のプラクティスになる可能性がある。エージェントコーディングツールであるClaude Codeを使って生成した567のGitHubプルリクエスト(PR)を、157のオープンソースプロジェクトで実証研究しました。
論文参考訳（メタデータ） (Thu, 18 Sep 2025 08:48:32 GMT)
ソフトウェア開発エージェントの利用実態に関する調査・報告。「Our findings show that while Agentic-PRs are accepted at a lower rate than Human-PRs (83.8% vs. 91.0%), they are still widely adopted in real-world projects. 」とかなりつかわれている・受け入れられている印象。
リポジトリはGitHub – mmikuu/OnTheUseOfAgenticCoding

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents [151.9]
PAL-UI (Planning with Active Look-back) を提案する。 PAL-UIは、二重レベルの要約エージェントを組み合わせ、観察レベルの手がかりとアクションレベルの結果の両方を、専用の検索ツールと組み合わせる。
論文参考訳（メタデータ） (Wed, 01 Oct 2025 01:48:39 GMT)
振り返りに相当するPAL（Planning with Active Look-back）を組み込んだエージェントの提案、「PAL-UI significantly outperforms both base MLLMs and state-of-the-art baselines on mobile navigation benchmarks, while also general- izing well to out-of-domain web environments. These results underscore the importance of active memory retrieval for robust GUI planning. Future work will explore extending PAL-UI to more complex tasks and environments, integrating reinforcement learning objectives, and broadening its applicability to real-world interactive systems.」とのこと。

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis [88.1]
MambaモデルはTransformerベースのモデルよりも計算上の優位性に大きく注目されている。本稿では,一層マンバモデルのトレーニング力学に関する最初の理論的解析を行った。マムバは、より多くのトレーニングを必要とするかもしれないが、線形変換器が許容できるしきい値を超える場合であっても、正確な予測を保っている。
論文参考訳（メタデータ） (Wed, 01 Oct 2025 01:25:01 GMT)
Mambaの理論的解析、「While linear Transformers may converge faster with smaller batch sizes, they can only in-context generalize effectively when the fraction of outlier-containing context examples is less than 1/2, much less than that for Mamba. Moreover, linear Transformers require significantly more context examples than Mamba to achieve comparable generalization performance. This highlights Mamba’s superior robustness to a high density of outliers in ICL.」というのは面白い特徴

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression [90.9]
Mambaは、Long-Sequence Modelingのための線形複雑性を持つ効率的なTransformer代替品である。最近の実証研究は、Mambaのテキスト内学習(ICL)がTransformersと競合していることを示している。本稿では,線形回帰 ICL タスクにおける Mamba のトレーニングダイナミクスについて検討する。
論文参考訳（メタデータ） (Sun, 28 Sep 2025 09:48:49 GMT)
「The loss bound is comparable to that of Transformer. Our theoretical results reveal the different mechanism between Transformer and Mamba on ICL, where Mamba emulates a variant of online gradient descent to perform in-context, while Transformers approximate a single step of gradient descent. Furthermore, our comparison with the S4 model demonstrates that the selection components are essential for Mamba to perform ICL.」とこちらも面白い指摘