2025年9月23日 – arXiv最新論文の紹介

WebWeaver, WebResearcher

Tongyi DeepResearch: A New Era of Open-Source AI Researchers | Tongyi DeepResearch関連、WebWeaverと WebResearcherの論文が出ていた。近いが様々なアプローチを試しているよう。

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research [73.6]
本稿では、AIエージェントが膨大なWebスケール情報を洞察に富むレポートに合成しなければならない複雑な課題である、オープンエンドディープリサーチ(OEDR)に取り組む。人間の研究プロセスをエミュレートする新しいデュアルエージェントフレームワークであるWebWeaverを紹介する。
論文参考訳（メタデータ） (Tue, 16 Sep 2025 17:57:21 GMT)
「In this paper, we introduced WebWeaver, a novel dual-agent framework designed to overcome the fundamental flaws of static, machine-like pipelines in open-ended deep research (OEDR). By emulating the human cognitive process that integrates the planner’s dynamic research cycle with the writer’s hierarchical retrieval and writing process, WebWeaver consistently outperforms both proprietary and open-source systems, establishing a new state-of-the-art.」

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.3]
WebResearcherは、マルコフ決定プロセスとしてディープリサーチを再構築する反復的なディープリサーチパラダイムである。 WebResearcherは最先端のパフォーマンスを実現し、フロンティアのプロプライエタリシステムを超えています。
論文参考訳（メタデータ） (Tue, 16 Sep 2025 17:57:17 GMT)
「(1) IterResearch, an iterative paradigm that reformulates deep research as a Markov Decision Process with periodic consolidation, overcoming the context suffocation and noise contamination of mono-contextual approaches; (2) WebFrontier, a scalable data synthesis engine that addresses training data scarcity through tool-augmented complexity escalation; and (3) a Research-Synthesis Framework that enables effective test-time scaling through parallel multi-agent exploration」の３要素からなるフレームワーク。

A Survey of Reinforcement Learning for Large Reasoning Models [98.6]
大規模言語モデルによる推論のための強化学習の最近の進歩について LRMのためのRLのさらなるスケーリングは、計算資源だけでなく、アルゴリズム設計、トレーニングデータ、インフラにおいても課題に直面している。
論文参考訳（メタデータ） (Wed, 10 Sep 2025 17:59:43 GMT)
LRMのための強化学習に関するサーベイだが、「To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area.」と目的にASIとは言っているのが興味深い。
リポジトリはGitHub – TsinghuaC3I/Awesome-RL-for-LRMs: A Survey of Reinforcement Learning for Large Reasoning Models

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data [119.8]
ScaleCUAは、オープンソースのコンピュータ利用データとファンデーションモデルをスケーリングするためのステップである。 6つのオペレーティングシステムと3つのタスクドメインにまたがる大規模なデータセットを提供する。
論文参考訳（メタデータ） (Thu, 18 Sep 2025 17:59:22 GMT)
「In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose cross-platform CUAs.」と非常に正攻法な性能向上。
リポジトリはGitHub – OpenGVLab/ScaleCUA: ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).