Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision [26.9]
我々は、コンピュータ・アズ・教師(CaT)による調査を監督に転換することを提案する。 CaTは平行ロールアウトのグループから単一の参照を合成し、それに向けて最適化する。テストタイムの手順として、CaTはGemma 3 4B、Qwen 3 4B、Llama 3.1 8Bを改善している。
論文参考訳（メタデータ） (Wed, 17 Sep 2025 17:59:42 GMT)
「(i) verifiable tasks use programmatic equivalence on final answers; (ii) non-verifiable tasks use self-proposed rubrics—binary, auditable criteria scored by an independent LLM judge, with reward given by the fraction satisfied.」と検証困難なタスクにも効果があるのが興味深い。「CaT can be applied at test time for inference-time gains or inside RL (CaT-RL) to improve the policy.」とのこと。強化学習でも効果を確認している。

コメントを残す

コメントを残す コメントをキャンセル