Autonomous Agent – ページ 12 – arXiv最新論文の紹介

Tool-Planner

Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering [30.3]
ツールキットに基づくタスク処理フレームワークであるTool-Plannerを提案する。 Tool-Plannerは同じ関数を持つAPI関数をツールキットにグループ化する。ツールエラーが発生した場合、言語モデルはツールキットに基づいてツールを再選択し、調整することができる。
論文参考訳（メタデータ） (Thu, 06 Jun 2024 07:30:14 GMT)
Agenticな動作で重要なツール選定を行わせるためのフレームワークの提案。ツールをクラスタリングして扱うことが有効というのは本当かと思いつつ、似たようなAPIが乱立している状況だとそうなるのかなと思わなくはない。
リポジトリはhttps://github.com/OceannTwT/Tool-Plannerとのことだが現時点では４０４

AgentGymとAGENTEVOL

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [117.0]
大規模言語モデル(LLM)はそのようなエージェントを構築するための有望な基盤と考えられている。我々は、自己進化能力を備えた一般機能 LLM ベースのエージェントを構築するための第一歩を踏み出す。我々はAgentGymを提案する。AgentGymは、幅広い、リアルタイム、ユニフォーマット、並行エージェント探索のための様々な環境とタスクを特徴とする新しいフレームワークである。
論文参考訳（メタデータ） (Thu, 06 Jun 2024 15:15:41 GMT)
複数のエージェントが動作しベンチマーク可能なフレームワークと自己進化のためのアルゴリズムの提案。
リポジトリはGitHub – WooooDyy/AgentGym: Code and implementations for the paper “AgentGym: Evolving Large Language Model-based Agents across Diverse Environments” by Zhiheng Xi et al.

Tool Learning with Large Language Models: A Survey

Tool Learning with Large Language Models: A Survey [60.7]
大規模言語モデル(LLM)を用いたツール学習は,高度に複雑な問題に対処するLLMの能力を強化するための,有望なパラダイムとして登場した。この分野での注目と急速な進歩にもかかわらず、現存する文献は断片化され、体系的な組織が欠如している。
論文参考訳（メタデータ） (Tue, 28 May 2024 08:01:26 GMT)
複雑な問題を解くためにToolを扱うアプローチに関するサーベイ。流行っている分野であり整理された情報はとても参考になる。リポジトリがあるのもありがたい。
リポジトリはGitHub – quchangle1/LLM-Tool-Survey: This is the repository for the Tool Learning survey.

Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions [77.8]
LLM エージェントによる評価プロセス全体を自動化した LLM の自動アリーナを提案する。最新のLLM17実験において,オートアリーナは人間の嗜好と最も高い相関関係を示した。
論文参考訳（メタデータ） (Thu, 30 May 2024 17:19:19 GMT)
LLMの評価手法の提案、「By using LLM agents to generate questions, employing LLM candidates in peer battles, and evaluating responses using LLM committee discussions, Auto-Arena produces less-contaminated, robust, and trustworthy evaluation results.」というエージェント的手法。自動評価ができるということは自動改善もできそうな気がするが、合議制で良いデータを作りfine tuningをしていくとどのくらいまで性能が上がるんだろうか。
プロジェクトサイト・リーダーボードはEmbedded Streamlit App (auto-arena.github.io)、英語と中国語でランキングがかなり異なるのが面白い。

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents [22.9]
ファウンデーションモデルに対応した生成人工知能はエージェントの開発と実装を容易にする。本稿では、コンテキスト、力、トレードオフを分析した16のアーキテクチャパターンからなるパターンカタログを提案する。
論文参考訳（メタデータ） (Thu, 16 May 2024 23:24:48 GMT)
生成AIを用いたエージェント構築のためのデザインパターンの紹介
急速に発展している感がある

Agent Planning with World Knowledge Model

Agent Planning with World Knowledge Model [88.5]
エージェント計画を容易にするためにパラメトリック世界知識モデル(WKM)を導入する。我々はWKMを開発し、グローバルな計画と動的状態の知識を導くために、事前のタスク知識を提供する。我々は、我々のWKMが視覚障害者の試行錯誤と幻覚的行動の問題を効果的に緩和できることを示すために分析を行った。
論文参考訳（メタデータ） (Thu, 23 May 2024 06:03:19 GMT)
World Knowledge Modelが計画に有効とのこと。それ自体は納得的でWKMを得るために「Specifically, we first steer the agent model to synthesize task knowledge from the comparison between expert and sampled trajectories. Then we prompt it to summarize state knowledge for each planning step from expert trajectories and combine the previous and next actions to build a state knowledge base. Lastly, we integrate the generated knowledge into expert trajectories and train a WKM.」という手順をとる。この手の設計が重要になっている。
リポジトリはhttps://github.com/zjunlp/WKMとのことだが、現時点では４０４

SGA: Scientific Generative Agent

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.4]
本稿では,大規模言語モデルの知識駆動型抽象推論能力をシミュレーションの計算力で強化することを提案する。本稿では,2段階最適化フレームワークであるSGA(Scientific Generative Agent)を紹介する。法発見と分子設計における枠組みの有効性を実証するための実験を行った。
論文参考訳（メタデータ） (Thu, 16 May 2024 03:04:10 GMT)
物理的シミュレーションとLLMを組みあわせ科学的発見をおこなうためのフレームワークの提案。「In conclution, we present Scientific Generative Agent, a bilevel optimization framework: LLMs serve as knowledgeable and adaptable thinkers, formulating scientific solutions like physics equations or molecule structures; concurrently, simulations operate as platforms for experimentation, offering observational feedback and optimizing continuous components like physical parameters.」と、LLMが人間的役割を担っている。
SORAのような（物理・世界シミュレーターとしての）動画生成モデルと組み合わさると自己完結的に深い思考ができるようになるのだろうか。そこまで行くとAGIの世界になりそうな気がする。。

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.2]
一般世界モデルは、人工知能(AGI)の実現への決定的な道のりを表現している本調査では,世界モデルの最新動向を包括的に調査する。我々は,世界モデルの課題と限界について検討し,今後の方向性について考察する。
論文参考訳（メタデータ） (Mon, 06 May 2024 14:37:07 GMT)
SoraがWorld simulatorとして機能しうるかは賛否が分かれているが、より広く（自動運転や自律エージェントなど）World simulatorになりうる生成系AIのサーベイ。「we expect world models to possess the ability of counterfactual reasoning, whereby outcomes are inferred through rational imagining.」はその通りで現時点ではまだ困難という印象を受けたが、実現できる未来はすぐだったりするのだろうか。
リポジトリも参考になる　GitHub – GigaAI-research/General-World-Models-Survey

AgentKit: Flow Engineering with Graphs, not Coding

AgentKit: Flow Engineering with Graphs, not Coding [91.1]
多機能エージェントのための直感的なLCMプロンプトフレームワーク(AgentKit)を提案する。 AgentKitは、単純な自然言語プロンプトから複雑な”思考プロセス”を明示的に構築するための統一されたフレームワークを提供する。
論文参考訳（メタデータ） (Wed, 17 Apr 2024 15:40:45 GMT)
LLMを用いたエージェント開発のためのフレームワーク。ブロックをつなぐようにしてLLMを使うものは多いが、Agentに寄せていてコードに近いレイヤに対応しているのが特徴的（使いやすいかは疑問だが、このくらいの抽象度のほうが開発に適していそう）
リポジトリはHolmeswww/AgentKit: An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex “thought process” from simple natural language prompts. (github.com)、ライセンスはCC-BY

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.8]
大規模言語モデルの自己改善のためのAlphaLLMを紹介する。モンテカルロ木探索(MCTS)とLLMを統合し、自己改善ループを確立する。実験の結果,AlphaLLM は付加アノテーションを使わずに LLM の性能を大幅に向上することがわかった。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 15:21:34 GMT)
Monte Carlo Tree Search + LLM、「we use the term option as a search node and propose option-level MCTS where each option represents a sequence of tokens, which can range from multiple tokens to several sentences.」というのが興味深く、性能向上にも寄与

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31