DeepCritic: Deliberate Critique with Large Language Models

DeepCritic: Deliberate Critique with Large Language Models [77.6]
我々は,Large Language Models(LLMs)の数学批判能力の研究と向上に焦点をあてる。 Qwen2.5-7B-Instructをベースとした批判モデルを開発した。
論文参考訳（メタデータ） (Thu, 01 May 2025 17:03:17 GMT)
Deepな批評を行うモデルの提案。「In Stage 1, we first utilize Qwen2.5-72B-Instruct to generate an initial step-wise critique for each step in the solution, followed by an in-depth critique of the initial critique.」、「In Stage 2, we perform RL to the SFT model on either existing human-annotated data or auto-labeled data via Monte Carlo sampling-based correctness estimation, to further stimulate the critique ability of the critic.」の2ステージ構成で構築。Criticモデルは他のモデル出力の修正にも有効なことが知られているが「our 7B critique model is also capable of supervising and correcting the outputs of a 72B generator, demonstrating a potential of weak-to-strong supervision」は興味深い。
リポジトリはGitHub – RUCBM/DeepCritic: Official repository for paper “DeepCritic: Deliberate Critique with Large Language Models”

コメントを残す

コメントを残す コメントをキャンセル