OVERTHINKING: Slowdown Attacks on Reasoning LLMs

OVERTHINKING: Slowdown Attacks on Reasoning LLMs [41.7]
OVERTHINK攻撃は、推論モデルを操作するサードパーティアプリケーションのコストを増幅する可能性がある。我々は、クローズド(OpenAI o1, o1-mini, o3-mini)とオープン(DeepSeek R1)の重み付けモデルを用いて、FreshQAおよびSQuADデータセットによる攻撃を評価した。
論文参考訳（メタデータ） (Tue, 04 Feb 2025 18:12:41 GMT)
推論効率を低下させるoverthinking攻撃、「Our experimental results show that OVERTHINK significantly disrupts reasoning efficiency, with attacks on the o1 model increasing reasoning tokens up to 18× and over 10× on DeepSeek-R1.」とのこと。
「Our attack contains three key stages: (1) picking a decoy problem that results in a large number of reasoning tokens, but won’t trigger safety filters; (2) integrating selected decoys into a compromised source (e g , a wiki page) by either modifying the problem to fit the context (context-aware) or by injecting a general template (context-agnostic), and, (3) optimizing the decoy tasks using an in-context learning genetic (ICL-Genetic) algorithm to select contexts with decoys that provide highest reasoning tokens and maintain stealthiness of the answers to the user.」というアプローチ。計算負荷の高い正規表現を使うDoSっぽいと思ってしまい、有効な攻撃になりえそう。。。

「In rare cases, R1 can get stuck “thinking forever”.」と記載がある論文を思い出した。

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models [43.2]
一般知識のみを必要とするNPRサンデーパズルチャレンジに基づくベンチマークを提案する。私たちの研究は、既存のベンチマークでは明らかでない機能ギャップを明らかにしています。
論文参考訳（メタデータ） (Mon, 03 Feb 2025 18:10:38 GMT)

コメントを残す

コメントを残す コメントをキャンセル