Agent – arXiv最新論文の紹介

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems [30.5]
本稿では,脳にインスパイアされたマルチメモリ・フレームワークであるRoboMemoryについて紹介する。継続的学習、マルチモジュールメモリレイテンシ、タスク相関キャプチャ、クローズドループ計画における無限ループ緩和といった現実の環境における課題に対処する。
論文参考訳（メタデータ） (Sat, 02 Aug 2025 15:39:42 GMT)
「Inspired by the brain’s unified memory mechanisms, we design a lifelong embodied mem- ory system with four parallel modules (Spatial, Temporal, Episodic, Semantic) under a unified framework. This framework supports parallelized update and retrieval across modules, mitigating latency accumulation in complex systems while facilitating coherent knowledge integration for lifelong learning.」という、AgenticなアプローチのMemory。
現状、現実的にはAgenticなアプローチだと思う一方で、どの段階でモデル構造に踏み込むべきなのかは気になるところ。

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.1]
Retrieval-Augmented Generation (RAG) は、外部知識を注入することによって、Large Language Models (LLM) の事実性を高める。逆に、純粋に推論指向のアプローチは、しばしば幻覚的あるいは誤った事実を必要とする。この調査は両鎖を統一的推論-検索の観点から合成する。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 15:44:18 GMT)
RAGに関するサーベイ。
論文リストなどはGitHub – DavidZWZ/Awesome-RAG-Reasoning: [Up-to-date] Awesome RAG Reasoning Resources

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets

The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets [12.1]
消費者と商店双方がAIエージェントを承認し、交渉と取引を完全に自動化する将来のシナリオについて検討する。我々の発見によると、AIによる取引は本質的に不均衡なゲームであり、異なるエージェントがユーザーに対して著しく異なる結果をもたらす。ユーザーはAIエージェントにビジネス上の決定を委譲する際に注意を払わなければならない。
論文参考訳（メタデータ） (Thu, 29 May 2025 17:41:39 GMT)
AI vs AIな検証。「In this paper, we designed an experimental framework to investigate potential issues and risks in Agent-to-Agent negotiations and transactions. Our analysis reveals that Agent-to-Agent negotiation and transaction is naturally an imbalanced game where users using less capable agents will face significant financial loss against stronger agents.」は予想されていることではあるが論文でも指摘されている通り格差拡大を招きかねない結果。
リポジトリはGitHub – ShenzheZhu/A2A-NT: Official code of “The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets”

Towards Trustworthy GUI Agents: A Survey

Towards Trustworthy GUI Agents: A Survey [64.6]
本調査では,GUIエージェントの信頼性を5つの重要な次元で検証する。敵攻撃に対する脆弱性、シーケンシャルな意思決定における障害モードのカスケードなど、大きな課題を特定します。 GUIエージェントが普及するにつれて、堅牢な安全基準と責任ある開発プラクティスを確立することが不可欠である。
論文参考訳（メタデータ） (Sun, 30 Mar 2025 13:26:00 GMT)
GUIエージェントの信頼性に関するサーベイ。整理軸は「Security」、「Reliability」、「Explainability」、「Ethical Alignment」、「Evaluation methodologies」

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.6]
制約,検証,選択という3つの重要な要素を持つモデルに依存しない,スケーラブルなエージェントフレームワークであるPlanGENを提案する。具体的には、推論時間アルゴリズムの性能を向上させるために、制約誘導反復検証を提案する。
論文参考訳（メタデータ） (Sat, 22 Feb 2025 06:21:56 GMT)
「PlanGEN comprises three specialized LLM agents: a constraint agent, a verification agent, and a selection agent.」というマルチエージェントフレームワーク。「Further, we introduced a Mixture of Algorithms, an iterative framework that integrates the selection agent (Figure 1) to dynamically choose the best algorithm.」とのことだが、MoAのAがAgentのものと紛らわしい。。
Gemini-1.5-Pro, Gemini-2.0-Flash, GPT-4o、それぞれ単一で使うよりも性能が向上しているようでアンサンブル的な効果は出ている。

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation [30.7]
本稿では,デジタル双生児が連続した人間の行動をシミュレートする能力を評価する最初のベンチマークであるBehavimentChainを紹介する。 BehaviorChainは、多種多様で高品質なペルソナベースの行動連鎖で構成され、1,001のユニークなペルソナに対して15,846の異なる振る舞いがある。総合的な評価結果は、最先端モデルでさえ、連続した人間の行動の正確なシミュレートに苦慮していることを示している。
論文参考訳（メタデータ） (Thu, 20 Feb 2025 15:29:32 GMT)
人のデジタルツインを構築できるのであれば可能なはずの連続的行動の予測に関するベンチマーク。「BEHAVIORCHAIN instance is composed of four key components: a persona profile p, a historical narrative h, a behavior chain B = {b1,b2,…,bn} of the specific persona, and the contextual setting for each behavior C = {c1,c2,…,cn}.」というデータセットで「BEHAVIORCHAIN comprises 1,001 high-quality, persona-based behavior chains, each containing 10–20 context-behavior nodes, automatically extracted from fiction and biographical literature.」とのこと。GPT-4oでも解くのが難しいタスクになっているようだがLlamaの性能が意外と高い。Leakの影響は気になるが面白いタスク。
リポジトリはGitHub – O-L1RU1/BehaviorChain

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models [92.9]
我々はマルコフ決定過程(MDP)として検索強化推論をモデル化するDeepRAGを提案する。クエリを反復的に分解することで、DeepRAGは外部知識を取得するか、あるいは各ステップでパラメトリック推論に依存するかを動的に決定する。実験の結果、DeepRAGは解答精度を21.99%向上させ、検索強化推論の最適化の有効性を示した。
論文参考訳（メタデータ） (Mon, 03 Feb 2025 08:22:45 GMT)
「(1) Binary Tree Search, (2) Imitation Learning, and (3) Chain of Calibration.」とかなり凝ったRAG。精度向上に効果があるのはそうだろうと思うが・・・。

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration [33.9]
視覚言語基礎モデル(CLIPなど)は、大規模な画像テキスト事前学習により、転送学習におけるその能力を示している。本稿では,分離されたエージェントの知識を統一的に伝達する,汎用的で簡潔なTransAgentフレームワークを提案する。われわれのTransAgentは、11の視覚的認識データセット上で最先端のパフォーマンスを達成する。
論文参考訳（メタデータ） (Wed, 16 Oct 2024 03:01:44 GMT)
エージェンティックなモデルの統合、「By adaptively integrating the external knowledge of agents from different modalities via MoA gating mechanism, TransAgent achieves state-of-the-art performance on 11 datasets under the low-shot scenarios.」とのこと。
リポジトリはGitHub – markywg/transagent: [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

OS-COPILOT/FRIDAY (Fully Responsive Intelligence, Devoted to Assisting You)とUFO（UI-Focused）

コンピュータ操作を含むエージェントに関する論文が2つ出ていた。LLMを用いた自律エージェント系の研究が非常に盛ん。

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement [48.3]
オペレーティングシステム(OS)の包括的要素と対話可能な汎用エージェントを構築するためのフレームワークであるOS-Copilotを紹介する。我々はOS-Copilotを使って、汎用コンピュータタスクを自動化する自己改善型実施エージェントであるFRIDAYを開発した。一般的なAIアシスタントのベンチマークであるGAIAでは、FRIDAYが従来の手法を35%上回り、以前のタスクから蓄積したスキルを通じて、目に見えないアプリケーションへの強力な一般化を示している。
論文参考訳（メタデータ） (Mon, 12 Feb 2024 07:29:22 GMT)
OS操作のためのフレームワークと自己改善型エージェントFRIDAYの提案。GAIA: A Benchmark for General AI Assistants – arXiv最新論文の紹介 (devneko.jp)のスコアはGPT-4 PluginsやAuto GPT-4を大きく上回る。
リポジトリはOS-Copilot: Towards Generalist Computer Agents with Self-Improvement

UFO: A UI-Focused Agent for Windows OS Interaction [42.0]
われわれは,Windows OS上のアプリケーションに適したユーザ要求を満たす,革新的なUIフォーカスエージェントであるUFOを紹介した。 UFOはデュアルエージェントフレームワークを使用して、グラフィカルユーザインタフェース(GUI)を注意深く観察し、分析し、Windowsアプリケーションの情報を制御する。我々は9つの人気のあるWindowsアプリケーションでUFOのテストを行い、ユーザの日々の使用を反映したさまざまなシナリオを網羅した。
論文参考訳（メタデータ） (Thu, 8 Feb 2024 15:40:35 GMT)
Microsoftによるエージェント。GPT-Visionを活用する方式。
リポジトリはmicrosoft/UFO: A UI-Focused Agent for Windows OS Interaction. (github.com)

CivRealm

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents [63.8]
文明ゲームに触発された環境であるCivRealmを紹介する。 CivRealmは、意思決定エージェントにとってユニークな学習と推論の課題である。
論文参考訳（メタデータ） (Fri, 19 Jan 2024 09:14:11 GMT)
AIがプレイするFreeciv ベースの環境の提案、当然だが現状で解くのは簡単ではない。AutoGPTを階層的に束ねた手法でも海賊への対策に失敗するとのこと。「The performance contrast between Mastaba and BaseLang highlights the necessity of a hierarchical decision architecture for tackling the complex scenarios presented by CivRealm.」というのはとても興味深い（人間の社会を見ているよう・・・）
リポジトリはbigai-ai/civrealm: CivRealm is an interactive environment for the open-source strategy game Freeciv-web based on Freeciv, a Civilization-inspired game. (github.com)

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31