Attack – ページ 2 – arXiv最新論文の紹介

Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking

Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking [6.9]
我々は、GenAIモデルをジェイルブレイクする能力により、攻撃者はRAGベースのアプリケーションに対する攻撃の結果をエスカレートできることを示した。論文の前半では、攻撃者がRAG文書抽出攻撃に対してRAGメンバシップ推論攻撃をエスカレートできることが示されている。論文の第2部では、攻撃者がRAGデータ中毒攻撃の規模を拡大し、単一のアプリケーションに妥協することで、GenAIエコシステム全体を妥協できることを示す。
論文参考訳（メタデータ） (Thu, 12 Sep 2024 13:50:22 GMT)
RAGに対する攻撃、RAG membership inference attacks、RAG entity extraction attacksからRAG documents extraction attacksへ。
「Adversarial Self-Replicating Prompts」の考え方が面白い。
リポジトリはGitHub – StavC/UnleashingWorms-ExtractingData: Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.2]
本稿では、事前訓練されたViTモデルから下流タスクへのこのような逆の脆弱性の伝達可能性について検討する。 DTAは攻撃成功率(ASR)が90%を超え、既存の手法をはるかに上回っていることを示す。
論文参考訳（メタデータ） (Sat, 03 Aug 2024 08:07:03 GMT)
下流タスクをターゲットとした攻撃手法の提案。downstream transfer attacks (DTAs)は有効とのこと。また、「We also found that emerging PETL methods like LoRA are more susceptible to transfer attacks crafted on the pre-trained model.」という指摘はそうだろうと思いつつ、有用な方法なので頭が痛い。

A Survey on Privacy Attacks Against Digital Twin Systems in AI-Robotics

A Survey on Privacy Attacks Against Digital Twin Systems in AI-Robotics [4.3]
産業 4.0 は、人工知能/機械学習(AI/ML)とデジタルツイン(DT)技術の統合によって、複雑なロボットが台頭するのを目撃している。本稿では,AIモデルとDTモデルによって実現されたロボットを対象としたプライバシ攻撃について調査する。
論文参考訳（メタデータ） (Thu, 27 Jun 2024 00:59:20 GMT)
デジタルツインに着目した攻撃に関するサーベイ
想定しているフレームワークは「Physical spaces comprise robotic sensors that collect data.Virtual space utilizes the data collected from physical space via a communication link between them.Predictions are generated by the AI models within vitual space, which are then analyzed before decisions are made by stakeholders.」

Chain of Attack

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM [27.0]
大規模言語モデル (LLM) は様々な自然言語処理タスクにおいて顕著な性能を発揮している。 CoAは、アタックポリシーを適応的に調整する意味駆動型コンテキスト型マルチターンアタック手法である。我々は、CoAがLLMの脆弱性を効果的に暴露し、既存の攻撃方法より優れていることを示す。
論文参考訳（メタデータ） (Thu, 09 May 2024 08:15:21 GMT)
マルチターンな攻撃方法
リポジトリはGitHub – YancyKahn/CoA: CoA: Context-Aware based Chain of Attack for Multi-Turn Dialogue LLM

Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models

Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models [60.2]
赤いチーム作りの分野は急速に成長しており、パイプライン全体をカバーする包括的な組織の必要性を強調している。 120以上の論文を調査し,言語モデル固有の能力に根ざした,きめ細かい攻撃戦略の分類を導入した。我々は,様々な自動レッド・チーム・アプローチを統合するサーチ・フレームワークを開発した。
論文参考訳（メタデータ） (Sun, 31 Mar 2024 09:50:39 GMT)
社会実装において重要なRed Teamingに関するサーベイ。「Figure 2: An overview of GenAI red teaming flow.」から始まる構成がわかりやすい。CC-BYなのもうれしいところ。

Many-shot jailbreaking \ Anthropic
我々は、有用で無害で正直なAIアシスタントをターゲットにした多発ジェイルブレーキング(MSJ)を研究した。MSJは数発のジェイルブレークの概念を拡張し、攻撃者はモデルが通常答えることを拒否する一連のクエリを含む架空の対話でモデルをプロンプトする。
「We found that the effectiveness of attacks, and of in-context learning more generally, could be characterized by simple power laws.」というとてもシンプルな攻撃が有効であったりもして攻撃戦略も日々進化している状況で安全性を確保していくのはとても大変。

Threats, Attacks, and Defenses in Machine Unlearning: A Survey

Threats, Attacks, and Defenses in Machine Unlearning: A Survey [15.1]
マシン・アンラーニング(MU)はAIの安全性を向上させる可能性に対して大きな注目を集めている。この調査は、機械学習における脅威、攻撃、防衛に関する広範な研究のギャップを埋めようとしている。
論文参考訳（メタデータ） (Wed, 20 Mar 2024 15:40:18 GMT)
Machine unlearning領域の攻撃や防御のサーベイ

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems [30.8]
テキスト埋め込みを反転させるテクニックであるVec2Textは、高密度検索システム内で深刻なプライバシー上の懸念を提起している。本稿では,Vec2Textを用いたテキストの復元性に影響を与えるであろう埋め込みモデルの様々な側面について検討する。そこで本研究では,テキスト復元可能性のリスクを軽減しつつ,同等のランク付け効率を確保できる埋め込み変換の修正を提案する。
論文参考訳（メタデータ） (Tue, 20 Feb 2024 07:49:30 GMT)
実務でもたまに話題になる2vecを戻せるか問題と戻せなくするための手法の提案。「Methods like Vec2Text, which can successfully reconstruct the original text from an embedding, could pose serious privacy risks, especially now embeddings are made publicly available via APIs (e g , OpenAI or Cohere).」とのことで、再現もできていて脅威になるよう。
リポジトリはielab/vec2text-dense_retriever-threat: Is Vec2Text Really a Threat toDense Retrieval Systems? (github.com)、jxmorris12/vec2text: utilities for decoding deep representations (like sentence embeddings) back to text (github.com)をベースに再現実験を行ったとのこと、weightもう公開されているielabgroup/vec2text_gtr-base-st_corrector · Hugging Face

Do Membership Inference Attacks Work on Large Language Models?

Do Membership Inference Attacks Work on Large Language Models? [145.9]
メンバーシップ推論攻撃(MIA)は、特定のデータポイントがターゲットモデルのトレーニングデータのメンバーであるかどうかを予測しようとする。我々は、Pileで訓練された言語モデルに対して、MIAの大規模評価を行い、そのパラメータは160Mから12Bまでである。様々な LLM サイズや領域にまたがるほとんどの設定において,MIA はランダムな推測よりもほとんど優れていないことがわかった。
論文参考訳（メタデータ） (Mon, 12 Feb 2024 17:52:05 GMT)
LLMへのメンバシップ推論攻撃は有効ではないのでは？という報告。「We identify specific settings where LLMs have been shown to be vulnerable to membership inference and show that the apparent success in such settings can be attributed to a distribution shift, such as when members and non-members are drawn from the seemingly identical domain but with different temporal ranges.」と手厳しい。結論にも書いてあったが、こういった特性を理解せずに何かに応用するのは危険であると思う。
リポジトリはiamgroot42/mimir: Python package for measuring memorization in LLMs (github.com)

Prompt Injection Attacks and Defenses in LLM-Integrated Applications

Prompt Injection Attacks and Defenses in LLM-Integrated Applications [63.9]
本稿では,インジェクション攻撃とその防御を形式化する枠組みを提案する。我々のフレームワークは、既存の攻撃を組み合わせることで、新たな攻撃を設計できる。また,迅速なインジェクション攻撃に対する防御を体系化する枠組みを提案する。
論文参考訳（メタデータ） (Thu, 19 Oct 2023 15:12:09 GMT)
LLMに対する攻撃を整理した報告
リポジトリはGitHub – liu00222/Open-Prompt-Injection: Prompt injection attacks and defenses in LLM-integrated applications

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Baseline Defenses for Adversarial Attacks Against Aligned Language Models [109.8]
我々は,大規模言語モデルに対する主要な敵攻撃に対するベースライン防衛戦略を評価した。検出(複雑度に基づく)、入力前処理(言い換えと再帰化)、対人訓練の3種類の防衛について検討する。驚くべきことに、他のドメインで予想されるよりも、フィルタリングや前処理で成功しています。
論文参考訳（メタデータ） (Fri, 1 Sep 2023 17:59:44 GMT)
LLMへの攻撃に対する対応に関する研究、detection (perplexity based), input preprocessing (paraphrase and retokenization), adversarial trainingが対象
「Interestingly, in this initial analysis, we find much more success with filtering and preprocessing strategies than in the vision domain, and that adaptive attacks against such defenses are non-trivial.」「The domain of LLMs is appreciably different from “classical” problems in adversarial machine learning.」という記載が印象的。

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31