The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs [39.9]
DLLMのユニークな安全性の弱点を生かした、最初の系統的な研究および脱獄攻撃フレームワークであるDIJAを提案する。提案するDIJAは,dLLMのテキスト生成機構を利用した対向的インターリーブ・マスクテキストプロンプトを構築する。本研究は, 新たな言語モデルにおいて, 安全アライメントの再考の必要性を浮き彫りにするものである。
論文参考訳（メタデータ） (Tue, 15 Jul 2025 08:44:46 GMT)
dLLMに対する攻撃手法の提案。「By interleaving sets of [MASK] tokens after vanilla malicious prompt, as shown in Figure 2, a dLLM is coerced into generating harmful instructions purely to maintain contextual consistency. Moreover, in contrast to autoregressive LLMs, which generate tokens sequentially and can perform on-the-fly rejection of unsafe continuations, dLLMs decode masked tokens in parallel at each step, substantially limiting the model’s ability to conduct dynamic risk assessment or intervene during generation (e g , reject sampling for tokens corresponding to harmful contents). Consequently, defenses designed for left-to-right models break down, opening the door to powerful new jailbreak attacks.」とある通り、CausalLMとは別体系であるモデルの特徴を利用した攻撃手法となっていて、攻撃成功率も高い。
リポジトリはGitHub – ZichenWen1/DIJA: code for “The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs”

コメントを残す

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル