2025年8月8日 – arXiv最新論文の紹介

Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques [11.2]
大規模言語モデル(LLM)は、脅威検出、脆弱性評価、インシデント応答に対するインテリジェントで適応的で自動化されたアプローチを可能にすることで、サイバーセキュリティを変革している。高度な言語理解とコンテキスト推論によって、LLMは、IoTやブロックチェーン、ハードウェアセキュリティといったドメイン間の課題に対処する従来の手法を超越している。
論文参考訳（メタデータ） (Fri, 18 Jul 2025 03:41:18 GMT)
「This survey provides a comprehensive overview of LLM applications in cybersecurity, focusing on two core areas: (1) the integration of LLMs into key cybersecurity domains, and (2) the vulnerabilities of LLMs themselves, along with mitigation strategies」というLLMとセキュリティに関するサーベイ。

UserBench: An Interactive Gym Environment for User-Centric Agents [110.8]
LLM(Large Language Models)ベースのエージェントは、推論とツールの使用において、目覚ましい進歩を遂げてきたが、ユーザと積極的にコラボレーションする能力はまだ未熟である。マルチターン、選好駆動インタラクションにおいてエージェントを評価するために設計されたユーザ中心のベンチマークであるUserBenchを紹介する。
論文参考訳（メタデータ） (Tue, 29 Jul 2025 17:34:12 GMT)
「Revolving around these traits, we introduce UserBench, a user-centric environment designed to facilitate an agent’s ability to engage in meaningful, multi-turn interactions with users who exhibit these traits. In UserBench, simulated users provide initial vague task instruction (underspecification), gradu- ally reveal preferences over time (incrementality),and often do so implicitly (indirectness). Agents must proactively clarify goals, interpret subtle cues, and adaptively reason through tool use to succeed.」という設定のベンチマークの提案。対象は旅行シナリオで曖昧な指示から対話を元に対処していく能力が求められる。
リポジトリはSalesforceAIResearch/UserBench