Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet

Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet [93.0]
テストタイムスケーリングは、モデルが長い推論チェーンを生成することによって、推論時間計算を増加させる。本手法は,知識集約型タスクにおいて,高い事実的精度と低幻覚率が不可欠である場合において,まだ有効ではないことを示す。以上の結果から,テスト時間計算の増大は必ずしも精度の向上には至らず,多くの場合において幻覚の増大につながることが示唆された。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 16:28:25 GMT)
「To summarize, while test-time scaling in reasoning models has led to strong performance in many domains, it is not yet effective for knowledge-intensive tasks. Increasing inference time does not consistently improve factual accuracy, and contrary to expectations, it can even increase hallucinations.」とのこと。LRMを使っていて感じていることと整合的。
リポジトリはGitHub – XuZhao0/tts-knowledge: Code and data for “Test-time scaling in reasoning models is not effective for knowledge-intensive tasks yet”

コメントを残す

コメントを残す コメントをキャンセル