ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases

ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases [58.4]
タスク完了のための「ショートカット」は、大規模言語モデルの信頼性評価と展開に重大なリスクをもたらす。我々は,LLMエージェントがテストケースを利用するための正当性を測定するベンチマークフレームワークであるImpossibleBenchを紹介する。実践的なフレームワークとして、ImpossibleBenchは単なる評価ではなく、汎用的なツールである。
論文参考訳（メタデータ） (Thu, 23 Oct 2025 06:58:32 GMT)
「we introduce ImpossibleBench, a benchmark framework that systematically measures LLM agents’ propensity to exploit test cases.」と不正行為を測るためのベンチマーク。「frontier models frequently cheat when faced with these impossible tasks, and stronger models generally exhibit higher cheating rates.」という指摘が興味深いし感覚にも合う・・・
リポジトリはGitHub – safety-research/impossiblebench

コメントを残す

コメントを残す コメントをキャンセル