International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management
International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management [115.9] 2025年の国際AI安全レポートの第2の更新は、この1年で汎用AIリスク管理の新しい展開を評価している。 研究者、公共機関、AI開発者が汎用AIのリスク管理にどのようにアプローチしているかを調べる。 論文参考訳(メタデータ) (Tue, 25 Nov 2025 03:12:56 GMT)
AI Safety Reportの最新版。ハイライトは非常に参考になるが、「Open-weight models lag less than a year behind leading closed-weight models, shifting the risk landscape.」という記載は重要に思える。
攻撃面で「tests show that sophisticated attackers can still bypass safeguards around half of the time when given 10 attempts.」、「As few as 250 malicious documents inserted into training data can allow attackers to trigger undesired model behaviours with specific prompts. Some research shows that such data poisoning attacks require relatively few resources to carry out, regardless of model size.」な状況だが、「The number of AI companies with Frontier AI Safety Frameworks more than doubled in 2025: at least 12 companies now have such frameworks.」という進み具合も興味深い。