AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security [126.5]
現在のガードレールモデルは、リスク診断におけるエージェント的リスク認識と透明性を欠いている。エージェントリスクをソース(場所)、障害モード(方法)、結果(何)で分類する統合された3次元分類法を提案する。 AgentDoG(AgentDoG)のための,エージェント安全性ベンチマーク(ATBench)と診断ガードレールフレームワークを新たに導入する。
論文参考訳（メタデータ） (Mon, 26 Jan 2026 13:45:41 GMT)
「AgentDoG provides fine-grained and contextual monitoring across agents’ trajectories, including malicious tool execution and prompt injection. More crucially, AgentDoG provides a more transparent perspective to understand why an agent takes a particular action in an unsafe or seemingly safe but unreasonible way,」と軌跡レベルで評価していく高性能なガードレールの提案。この分野の外観を知るにも良い論文だと思う。
リポジトリはGitHub – AI45Lab/AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

コメントを残す

コメントを残す コメントをキャンセル