Autonomous Agent – arXiv最新論文の紹介

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory [11.7]
本稿では,長期記憶を備えた新しいフレームワークであるM3-Agentを紹介する。 M3-Agentは、リアルタイムの視覚および聴覚入力を処理して、長期記憶の構築と更新を行うことができる。我々は,M3-Benchという長ビデオ質問応答ベンチマークを開発した。
論文参考訳（メタデータ） (Wed, 13 Aug 2025 12:03:03 GMT)
こちらも長期記憶を備えたエージェントフレームワークの提案。「Compared to the strongest baseline, Gemini-GPT4o-Hybrid, which implements M3-Agent framework by prompting Gemini-1.5-Pro [41] for memorization and GPT-4o [15] for control, M3-Agent improves accuracy by 6.7%, 7.7%, and 5.3% on M3-Bench-robot, M3-Bench-web, and VideoMME-long, respectively. Our ablation study demonstrates the importance of semantic memory: removing it reduces accuracy by 17.1%, 19.2% and 13.1% on M3-Bench-robot, M3-Bench-web, and VideoMME-long, respectively.」と効果を報告している。
プロジェクトサイトはSeeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Memp: Exploring Agent Procedural Memory

Memp: Exploring Agent Procedural Memory [72.4]
LLM(Large Language Models)ベースのエージェントは様々なタスクをこなすが、静的パラメータで手動で設計または絡み合うような不安定なプロシージャメモリに悩まされる。本稿では,過去のエージェントの軌跡をステップバイステップの細粒度と高レベルなスクリプトライクな抽象化の両方に蒸留するMempを提案する。メモリレポジトリが洗練されるにつれて、エージェントは着実に高い成功率と類似タスクの効率を達成できることを示す。
論文参考訳（メタデータ） (Fri, 08 Aug 2025 16:20:56 GMT)
エージェントへのMemory導入、「Empirical results on housework automation and information-seeking bench- marks show that leveraging procedural memory significantly boosts task success rates and efficiency. Beyond improving individual episodes, Memp supports continual learning and robust generalization, marking a step toward self-improving, resilient agents.」とのこと。
メモリ管理はシンプルに行っているように見える。

Teaching Language Models To Gather Information Proactively

Teaching Language Models To Gather Information Proactively [53.9]
大規模言語モデル(LLM)は、ますます協力的なパートナーとして機能することが期待されている。本研究では,アクティブな情報収集という新たなタスクパラダイムを導入する。キー情報をマスキングする、部分的に特定された現実世界のタスクを生成するスケーラブルなフレームワークを設計する。このセットアップの中核となるイノベーションは、真に新しい暗黙のユーザー情報を引き出す質問に報酬を与える、強化された微調整戦略です。
論文参考訳（メタデータ） (Mon, 28 Jul 2025 23:50:09 GMT)
「proactive information gathering」を行うよう、Synthetic Conversation EngineとReinforcement Fine-Tuningによってモデルを強化するフレームワークを提案、「Qwen 2.5-7B model significantly outperforms 03-mini by 18% on automatic evaluation metrics. More importantly, human evaluation reveals that clarification questions and final outlines generated by our model are favored by human annotators by 42% and 28% respectively.」とのこと。

Magentic-UI: Towards Human-in-the-loop Agentic Systems

Magentic-UI: Towards Human-in-the-loop Agentic Systems [34.5]
本稿では,ヒューマンエージェントインタラクションの開発と研究のためのオープンソースのWebインターフェースであるMagentic-UIを紹介する。柔軟なマルチエージェントアーキテクチャに基づいて構築されたMagentic-UIは、Webブラウジング、コード実行、ファイル操作をサポートする。エージェントベンチマークによる自律的なタスク補完、インタラクション機能のユーザテストのシミュレーション、実際のユーザとの質的研究、ターゲットとする安全性評価の4つの側面でMagentic-UIを評価した。
論文参考訳（メタデータ） (Wed, 30 Jul 2025 03:49:14 GMT)
「Six interaction mechanisms designed to support low-cost, human-agent interaction in Magentic- UI: co-planning, co-tasking, action approval, answer verification, memory, and multi-tasking.」と人間と強調しながら動作するエージェント開発のためのフレームワーク。
リポジトリはmicrosoft/magentic-ui: A research prototype of a human-centered web agent

Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization

Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization [48.0]
大型言語モデル(LLM)は複雑な問題に対処するためにチェーン・オブ・シント(CoT)技術を利用する。ドメイン知識を統合した新しいエージェントフレームワークであるChatBatteryを,材料設計におけるより効果的な推論に向けて導入する。新規リチウムイオン電池陰極材料3種を同定,合成,特性評価し,28.8%,25.2%,18.5%の実用能力向上を実現した。
論文参考訳（メタデータ） (Mon, 21 Jul 2025 23:46:11 GMT)
科学的発見を支援するAI、「ChatBattery is an AI-driven material optimization platform structured into two synergistic phases: exploration and exploitation. Together, these phases encompass eight sequential stages, orchestrated by seven specialized agents.」とかなり複雑な構成のマルチエージェントシステムになっている。加えて、人間とのコラボレーションが重視されているように見える。
- This suggests that ChatBattery, in its present form, is more adept at optimizing within known paradigms than at generating fundamentally new chemistries. As such, expert input remains essential to expand the system’s exploration boundaries and push beyond conventional chemical spaces. Importantly, this interplay between AI-driven generation and human-guided refinement also creates unexpected opportunities, as demonstrated in the refinement of AI-suggested materials into even more advanced cathode compositions. However, advances anticipated with future reasoning AIs are likely to provide greater exploration and creativity.という記載がある。
「ChatBattery, we successfully identify, synthesize, and characterize three novel lithiumion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively, over the widely used cathode material, LiNi0.8Mn0.1Co0.1O2 (NMC811).」と効果があったとのこと。

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents [45.5]
大規模言語モデル(LLM)の最近の進歩は、自律型AIエージェントの台頭を触媒している。これらの大きなモデルエージェントは、静的推論システムからインタラクティブなメモリ拡張エンティティへのパラダイムシフトを示す。
論文参考訳（メタデータ） (Mon, 30 Jun 2025 13:34:34 GMT)
AIエージェントとセキュリティリスクに関するサーベイ。
検討ポイントが多い。。

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.5]
データ構造化は、複雑で非組織的なデータをよく構造化された形式に変換することで、有望な役割を果たす。この調査では、グラフがAIエージェントにどのように権限を与えるかを、初めて体系的にレビューする。
論文参考訳（メタデータ） (Sun, 22 Jun 2025 12:59:12 GMT)
グラフとエージェントに関するサーベイ
リポジトリはGitHub – YuanchenBei/Awesome-Graphs-Meet-Agents: A curated list of resources on graph-empowered agents and agent-facilitated graph learning (Graphs Meet Agents).

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Establishing Best Practices for Building Rigorous Agentic Benchmarks [94.7]
多くのエージェントベンチマークではタスク設定や報酬設計が問題となっている。このような問題は、相対的な用語で、過小評価または過大評価エージェントのパフォーマンスを最大100%向上させる可能性がある。我々はベンチマーク構築経験から要約したガイドラインの集合であるAgentic Benchmark Checklist (ABC)を紹介した。
論文参考訳（メタデータ） (Thu, 03 Jul 2025 17:35:31 GMT)
構築が難しいエージェント系ベンチマークの注意点をまとめた論文。
「the issues found in τ-bench-Airline, some other example issues we found are: (1) an agent can score 100% on SWE-Lancer without resolving any tasks;」のような問題は相応にある気がするし、「Based on ABC, we assessed ten widely used agentic benchmarks and identified significant evaluation issues that cases up to 100% errors (in relative terms) when estimating agents’ performance.」も驚愕という感じではない。
リポジトリはGitHub – uiuc-kang-lab/agentic-benchmarks

Towards AI Search Paradigm

Towards AI Search Paradigm [42.6]
我々は,人間の情報処理と意思決定をエミュレートできる次世代検索システムの青写真であるAI Search Paradigmを紹介する。このパラダイムは、4つのLCMを動力とするエージェントのモジュラーアーキテクチャを採用し、情報要求の完全な範囲に動的に適応する。この研究は、これらのコンポーネントの詳細なガイドを提供することによって、信頼できる、適応的でスケーラブルなAI検索システムの開発を知らせることを目的としている。
論文参考訳（メタデータ） (Fri, 20 Jun 2025 17:42:13 GMT)
検索用のマルチエージェントフレームワークの整理
検索とLLMの関係性がよくわかる論文

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models [45.1]
Webのコンテキストでは、退屈な日々のタスクを扱う人々を支援するために、AI Agents — WebAgents — を活用することで、生産性と効率が劇的に向上する。 LFMの可能性を十分に探求するために、ユーザの指示に従って日々のWebタスクを完了させるように設計されたWebAgentsに広範な研究が登場した。
論文参考訳（メタデータ） (Mon, 26 May 2025 07:05:18 GMT)
利用が広がるWebAgentのサーベイ

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31