Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs [36.3]
本稿では,適応型大言語モデル(LLM)エージェントのための新しい学習パラダイムを提案する。本手法は,メモリベースのオンライン強化学習により,低コストで連続的な適応を可能にする。我々はエージェントモデルを,GAIA検証でトップ1に達するMementoというディープリサーチ環境でインスタンス化する。
論文参考訳（メタデータ） (Mon, 25 Aug 2025 13:32:12 GMT)
「Memento formalises deep research agents as a memory-based Markov Decision Process (MDP) and implements it within a planner–executor framework, leveraging an episodic case bank to record and retrieve trajectories for continual policy improvement.」というメモリ機構を持つエージェントフレームワークの提案。
リポジトリはGitHub – Agent-on-the-Fly/Memento: Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

コメントを残す

コメントを残す コメントをキャンセル