R-Zero: Self-Evolving Reasoning LLM from Zero Data

R-Zero: Self-Evolving Reasoning LLM from Zero Data [56.7]
自己進化型大規模言語モデル(LLM)は、自身の経験から自律的に生成、精製、学習することで、超知性へのスケーラブルなパスを提供する。このようなモデルを訓練するための既存の方法は、いまだに膨大な人為的なタスクやラベルに大きく依存している。 R-Zeroは、完全に自律的なフレームワークで、スクラッチから独自のトレーニングデータを生成する。
論文参考訳（メタデータ） (Thu, 07 Aug 2025 03:38:16 GMT)
「we propose R-Zero, a framework for training reasoning LLMs that can self-evolve from zero external data. In R-Zero, a single base model is initialized with two roles – a Challenger and a Solver that are independently optimized but co-evolve throughout the RL process.」、「Challenger is rewarded for proposing tasks near the edge of the Solver’s capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger.」というGANっぽいフレームワーク。
リポジトリはChengsong-Huang/R-Zero: codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)

コメントを残す

コメントを残す コメントをキャンセル