The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [16.3]
大規模な推論モデルは、回答を提供する前に詳細な思考プロセスを生成する。我々は, LRM がある種の複雑さを超えて完全に精度の低下に直面していることを示す。また、より深く推論の痕跡を調べ、探索された解のパターンを研究する。
論文参考訳（メタデータ） (Sat, 07 Jun 2025 22:42:29 GMT)
LRMに対する分析。「Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget.」とのこと。
面白い検証結果。とはいえ、このような劣化はLLMの計算能力などでも指摘されてきた印象がある。直観的には現状のLLM/LRMはメタな解放に行きつけないという印象を持つが、コード生成などツール活用すれば多分解けるレベルであろうし解釈は悩ましいところ。
「We identified three distinct reasoning regimes: standard LLMs outperform LRMs at low complexity, LRMs excel at moderate complexity, and both collapse at high complexity.」は今の感覚としてはそうだろうと思う。
賛否はあるだろうが、下記のようにAnthropicのC. Opusから反論が来ているのが面白い。

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [0.0]
大規模推論モデル(LRM)は、特定の複雑性しきい値を超えた計画パズルについて「精度の崩壊」を示す。これらの結果は,基本的推論失敗ではなく,実験的な設計上の制約を主に反映していることが実証された。
論文参考訳（メタデータ） (Tue, 10 Jun 2025 21:16:53 GMT)
1st authorがAnthropicのC. Opus、Acknowledgmentsに「We thank Ryan Greenblatt, o3, Gemini 2.5, and all of the people who pointed out the parentheses mismatch in an earlier draft for helpful comments」と書かれている。

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル