Security – arXiv最新論文の紹介

LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres

LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres [15.2]
大規模言語モデルのセキュリティオペレーションセンター(SOC)への統合は、アナリストの作業量を削減するための変革的かつまだ進化している機会を提供する。本稿では,SOCアナリスト45名を対象に,10ヶ月で3,090件の質問に対して縦断調査を行った。分析の結果,LLMを高精細度判定ではなく,センスメイキングやコンテキストビルディングのオンデマンド支援として活用していることが判明した。
論文参考訳（メタデータ） (Tue, 26 Aug 2025 11:40:02 GMT)
SOCアナリストがどのようにLLMを使っているかの報告。
「By analysing thousands of analyst-generated queries, we found that analysts use LLMs as on-demand, task-focused cognitive aids for a variety of tasks, including explaining commands, writing scripts, or improving documentation, rather than as full-time copilots.」は現状としてはそうだろうなという印象。

Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques

Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques [11.2]
大規模言語モデル(LLM)は、脅威検出、脆弱性評価、インシデント応答に対するインテリジェントで適応的で自動化されたアプローチを可能にすることで、サイバーセキュリティを変革している。高度な言語理解とコンテキスト推論によって、LLMは、IoTやブロックチェーン、ハードウェアセキュリティといったドメイン間の課題に対処する従来の手法を超越している。
論文参考訳（メタデータ） (Fri, 18 Jul 2025 03:41:18 GMT)
「This survey provides a comprehensive overview of LLM applications in cybersecurity, focusing on two core areas: (1) the integration of LLMs into key cybersecurity domains, and (2) the vulnerabilities of LLMs themselves, along with mitigation strategies」というLLMとセキュリティに関するサーベイ。

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents

A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents [45.5]
大規模言語モデル(LLM)の最近の進歩は、自律型AIエージェントの台頭を触媒している。これらの大きなモデルエージェントは、静的推論システムからインタラクティブなメモリ拡張エンティティへのパラダイムシフトを示す。
論文参考訳（メタデータ） (Mon, 30 Jun 2025 13:34:34 GMT)
AIエージェントとセキュリティリスクに関するサーベイ。
検討ポイントが多い。。

Mistral Agents API, DeepSeek-R1-0528

先週は企業ニュースというよりarXiv論文の投稿が目立った週だった。更新論文抜きで3700本出ておりチェックがとても大変である。

そんな中注目はMistral AIのBuild AI agents with the Mistral Agents API | Mistral AI。OpenAIにも感じるが単純なAPI提供だけでなくAIの総合的な機能をサポートし多くの部分をクラウド側に持っていく動きは広がっていくんだろうと思う。

NVD – CVE-2025-37899、How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation – Sean Heelan’s BlogにあるようにAIの能力はとても上がっていて、なくてはならないものになるつつある。Agenticな動作は強力な一方でAPIとの付き合い方は悩ましいところ。

公開モデル関連の話だと、DeepSeek R1の新バージョンがリリースされたよう。上記とは正反対の公開モデルやOSSの動きも要チェック。

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

LLMs unlock new paths to monetizing exploits

LLMs unlock new paths to monetizing exploits [85.6]
大規模言語モデル(LLM)はすぐにサイバー攻撃の経済性を変えるだろう。 LLMは、敵がユーザーごとにカスタマイズされた攻撃を起動することを可能にする。
論文参考訳（メタデータ） (Fri, 16 May 2025 17:05:25 GMT)
LLMの悪用可能性に関する報告。より適合的な攻撃ができるのはそうだろうと思う。
「To demonstrate this capability, we divide all emails from the Enron dataset into 150 (potentially overlapping) sets, grouped by the Enron employee who has sent or received that email. We then feed each of these collections of emails into a LLM (Claude 3.5 Sonnet) and ask it to describe everyone who this employee is emailing. Doing this identifies one Enron employee (John G.) who is having an extramarital affair with a coworker.」は大規模データ分析の点からも興味深い。

Teaching Models to Understand (but not Generate) High-risk Data

Teaching Models to Understand (but not Generate) High-risk Data [38.3]
SLUNG(Selective Loss to Understand but not Generate)を紹介する。 SLUNGは、モデルが高リスクデータを生成せずに理解することを学ぶための事前学習パラダイムである。 SLUNGは、生成を増大させることなく、モデルによる高リスクデータの理解を一貫して改善することを示す。
論文参考訳（メタデータ） (Mon, 05 May 2025 22:24:06 GMT)
「This work introduces SLUNG, a pre-training paradigm that enables language models to learn from high-risk data without being trained to generate it. By selectively adjusting the training objective at the token level based on risk, SLUNG decouples a model’s ability to understand from its ability to generate, allowing models to condition on high-risk inputs while learning from adjacent low-risk tokens.」という手法の提案。口外することはできないが学ぶ必要があるもの、というのは現実的に多いわけでこのような手法は非常に面白い。

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models [33.2]
Cybenchは、サイバーセキュリティタスクを特定し、それらのタスク上でエージェントを評価するためのフレームワークである。エージェント能力を評価するために,gpt-4o,claude 3 opus,claude 3.5 sonnet,mixtral 8x22b instruct,gemini 1.5 pro,llama 3 70b chat,llama 3.1 405b instructの7モデルを評価する。
論文参考訳（メタデータ） (Thu, 15 Aug 2024 17:23:10 GMT)
CTFコンペから抽出したタスクをLLMが解けるかのベンチマーク。ガイドなしだとまだまだ難しそうな感じ。閲覧時点ではClaude 3.5 Sonnet > GPT-4o > Claude 3 Opusで、オープン系のLlama 3.1 405B Instructは商用モデルに比べてかなり性能が低い。
リポジトリはCybench

Towards more Practical Threat Models in Artificial Intelligence Security

Towards more Practical Threat Models in Artificial Intelligence Security [71.5]
我々は、AIセキュリティ研究で最も研究されている6つの攻撃の脅威モデルを再検討し、実際にAIの使用と一致させる。我々の論文は、人工知能のセキュリティにおけるより実用的な脅威モデルを研究するための行動である。
論文参考訳（メタデータ） (Thu, 16 Nov 2023 16:09:44 GMT)
AIセキュリティに関して研究と実際の差を分析した論文。key findingsを見るだけでも結構なギャップがありそう。。。

セキュリティ分野におけるグラフ分析のサーベイ

Graph Mining for Cybersecurity: A Survey [60.8]
マルウェア、スパム、侵入などのサイバー攻撃の爆発的な増加は、社会に深刻な影響をもたらした。従来の機械学習(ML)ベースの手法は、サイバー脅威の検出に広く用いられているが、現実のサイバーエンティティ間の相関をモデル化することはほとんどない。グラフマイニング技術の普及に伴い、サイバーエンティティ間の相関を捉え、高いパフォーマンスを達成するために、多くの研究者がこれらの手法を研究している。
論文参考訳（メタデータ） (Sun, 2 Apr 2023 08:43:03 GMT)
サイバーセキュリティにおけるグラフマイニング活用のサーベイ
ネットワークなどセキュリティに関わる要素とグラフ構造は相性が良いので活用が期待される（というかよく活用されている）分野

Learned Systems Security

Learned Systems Security [30.4]
学習システムは内部で機械学習(ML)を使用してパフォーマンスを向上させる。このようなシステムは、いくつかの敵対的ML攻撃に対して脆弱であることが期待できる。 MLの使用に起因した脆弱性を識別するフレームワークを開発する。
論文参考訳（メタデータ） (Tue, 20 Dec 2022 15:09:30 GMT)
（機械）学習モデルを含むシステムのセキュリティについて扱った論文
学習機構を悪用することで攻撃ができるとの内容で、この手の考慮が必要と再認識できる。

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30