Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [84.7] 大規模言語モデル(LLM)エージェントは、人間の計算データへの依存によって制約される。 我々は,外部データを持たない高性能エージェントを進化させる完全自律型フレームワークであるAgent0を紹介する。 Agent0は推論能力を大幅に向上させ、Qwen3-8B-Baseモデルを数学的推論で18%改善し、一般的な推論ベンチマークで24%改善した。 論文参考訳(メタデータ) (Thu, 20 Nov 2025 05:01:57 GMT)
「we initialize two functionally distinct agents: an execu- tor agent and a curriculum agent. These agents co-evolve through a symbiotic competition: the curriculum agent is trained using RL (Shao et al , 2024) to propose frontier tasks that precisely challenge the executor’s current capabilities, using the executor’s uncertainty (i.e., self-consistency across multiple answers) and its frequency of tool use as reward signals. Concurrently, the executor agent is trained via RL to successfully solve these tasks, optimizing on a filtered set of challenging problems generated by the frozen curriculum agent and using pseudo-labels derived from its own majority voting. Equipping the executor with a tool enhances its problem-solving abilities, which in turn com- pels the tool-equipped curriculum agent to generate more complex, tool-based curricula.」という複数エージェントを活用した共進化なフレームワーク。Agent構築においても近いアプローチが流行っているように思う。