DeepResearch – ページ 2 – arXiv最新論文の紹介

InfoAgent: Advancing Autonomous Information-Seeking Agents

InfoAgent: Advancing Autonomous Information-Seeking Agents [143.2]
本稿では,革新的なデータ合成パイプラインとWeb検索ツールを駆使したディープリサーチエージェントInfoAgentを紹介する。我々の方法では、InfoAgentはBrowseCompで15.3%、BrowseComp-ZHで29.2%、Xbench-DSで40.4%の精度を達成した。
論文参考訳（メタデータ） (Mon, 29 Sep 2025 17:59:57 GMT)
Deep Researchエージェントの構築。Qwen3 14Bベースで合成データを活用、「In the first stage, we perform supervised finetuning (SFT) as a cold start, in order to instill long-horizon search behavior into the model.」、「In the second stage, we apply RL to refine its ability of reasoning-driven tool use.」の2段階でのpost training。
合成データ、post trainingの有効性を示す結果で、ベースモデルサイズもお手頃感がある。このようなSLMの開発が流行っていく可能性を感じる結果。

Tongyi DeepResearch: A New Era of Open-Source AI Researchers | Tongyi DeepResearch関連、WebWeaverと WebResearcherの論文が出ていた。近いが様々なアプローチを試しているよう。

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research [73.6]
本稿では、AIエージェントが膨大なWebスケール情報を洞察に富むレポートに合成しなければならない複雑な課題である、オープンエンドディープリサーチ(OEDR)に取り組む。人間の研究プロセスをエミュレートする新しいデュアルエージェントフレームワークであるWebWeaverを紹介する。
論文参考訳（メタデータ） (Tue, 16 Sep 2025 17:57:21 GMT)
「In this paper, we introduced WebWeaver, a novel dual-agent framework designed to overcome the fundamental flaws of static, machine-like pipelines in open-ended deep research (OEDR). By emulating the human cognitive process that integrates the planner’s dynamic research cycle with the writer’s hierarchical retrieval and writing process, WebWeaver consistently outperforms both proprietary and open-source systems, establishing a new state-of-the-art.」

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.3]
WebResearcherは、マルコフ決定プロセスとしてディープリサーチを再構築する反復的なディープリサーチパラダイムである。 WebResearcherは最先端のパフォーマンスを実現し、フロンティアのプロプライエタリシステムを超えています。
論文参考訳（メタデータ） (Tue, 16 Sep 2025 17:57:17 GMT)
「(1) IterResearch, an iterative paradigm that reformulates deep research as a Markov Decision Process with periodic consolidation, overcoming the context suffocation and noise contamination of mono-contextual approaches; (2) WebFrontier, a scalable data synthesis engine that addresses training data scarcity through tool-augmented complexity escalation; and (3) a Research-Synthesis Framework that enables effective test-time scaling through parallel multi-agent exploration」の３要素からなるフレームワーク。

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.3]
本稿では,ディープリサーチのためのネイティブ自律単エージェントモデルの開発に焦点をあてる。我々の最良の変種であるSFR-DR-20Bは、HumanityのLast Examベンチマークで28.7%に達する。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 02:07:09 GMT)
「we propose a compact synthetic-data reinforcement learning recipe that adapts reasoningoptimized LLMs into native Autonomous Single-Agent systems for Deep Research. Applied to open-source backbones, our best variant attains 28.7% on Humanity’s Last Exam.」と合成データを活用したDeep Researchエージェント構築フレームワークの提案。

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs [36.3]
本稿では,適応型大言語モデル(LLM)エージェントのための新しい学習パラダイムを提案する。本手法は,メモリベースのオンライン強化学習により,低コストで連続的な適応を可能にする。我々はエージェントモデルを,GAIA検証でトップ1に達するMementoというディープリサーチ環境でインスタンス化する。
論文参考訳（メタデータ） (Mon, 25 Aug 2025 13:32:12 GMT)
「Memento formalises deep research agents as a memory-based Markov Decision Process (MDP) and implements it within a planner–executor framework, leveraging an episodic case bank to record and retrieve trajectories for continual policy improvement.」というメモリ機構を持つエージェントフレームワークの提案。
リポジトリはGitHub – Agent-on-the-Fly/Memento: Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.7]
推論とエージェント能力を備えた大規模言語モデル(LLM)は、エージェントディープリサーチ(Agenic Deep Research)と呼ばれる新しいパラダイムを取り入れている。静的なWeb検索から,計画,探索,学習を行う対話型エージェントベースのシステムへの進化を辿ります。我々はエージェントディープリサーチが既存のアプローチを著しく上回るだけでなく、将来の情報探索において支配的なパラダイムになることを実証する。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 17:18:00 GMT)
DeepResearchに関するサーベイ、論文が出るのも凄いスピードだが、サーベイが出るのも早い・・・
リポジトリはGitHub – DavidZWZ/Awesome-Deep-Research: [Up-to-date] Awesome Agentic Deep Research Resources