Do Membership Inference Attacks Work on Large Language Models?

Do Membership Inference Attacks Work on Large Language Models? [145.9]
メンバーシップ推論攻撃(MIA)は、特定のデータポイントがターゲットモデルのトレーニングデータのメンバーであるかどうかを予測しようとする。我々は、Pileで訓練された言語モデルに対して、MIAの大規模評価を行い、そのパラメータは160Mから12Bまでである。様々な LLM サイズや領域にまたがるほとんどの設定において,MIA はランダムな推測よりもほとんど優れていないことがわかった。
論文参考訳（メタデータ） (Mon, 12 Feb 2024 17:52:05 GMT)
LLMへのメンバシップ推論攻撃は有効ではないのでは？という報告。「We identify specific settings where LLMs have been shown to be vulnerable to membership inference and show that the apparent success in such settings can be attributed to a distribution shift, such as when members and non-members are drawn from the seemingly identical domain but with different temporal ranges.」と手厳しい。結論にも書いてあったが、こういった特性を理解せずに何かに応用するのは危険であると思う。
リポジトリはiamgroot42/mimir: Python package for measuring memorization in LLMs (github.com)

コメントを残す

コメントを残す コメントをキャンセル