SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.3]
本稿では,ディープリサーチのためのネイティブ自律単エージェントモデルの開発に焦点をあてる。我々の最良の変種であるSFR-DR-20Bは、HumanityのLast Examベンチマークで28.7%に達する。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 02:07:09 GMT)
「we propose a compact synthetic-data reinforcement learning recipe that adapts reasoningoptimized LLMs into native Autonomous Single-Agent systems for Deep Research. Applied to open-source backbones, our best variant attains 28.7% on Humanity’s Last Exam.」と合成データを活用したDeep Researchエージェント構築フレームワークの提案。

コメントを残す

コメントを残す コメントをキャンセル