s1: Simple test-time scaling – arXiv最新論文の紹介

s1: Simple test-time scaling [148.4]
テスト時間スケーリングは、パフォーマンスを改善するために余分なテスト時間計算を使用する言語モデリングに対する、有望な新しいアプローチである。テストタイムのスケーリングと強力な推論性能を実現するための最もシンプルなアプローチを探します。
論文参考訳（メタデータ） (Mon, 03 Feb 2025 16:31:30 GMT)
「We show that SFT on only 1,000 examples suffices to build a competitive reasoning model matching o1-preview and produces a model that lies on the pareto frontier 」という報告。「First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end.」とWaitを使うのが特徴的（Think before you speak: Training Language Models With Pause Tokens – arXiv最新論文の紹介を思い出す）
リポジトリはGitHub – simplescaling/s1: s1: Simple test-time scaling

コメントを残す

コメントを残す コメントをキャンセル