SimulateBench – arXiv最新論文の紹介

How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation [49.2]
我々は,AIエージェントの信頼性を評価するための2つの指標,一貫性と堅牢性,およびベンチマークであるSimulateBenchを紹介する。エージェント (i) が長文入力を提示した場合の文字情報の正確な描写に苦慮していること, (ii) プロファイルの摂動に対する脆弱性を示すこと, (iii) 全体としての信頼性に影響を及ぼす重要な要因に大きく影響していること,などが判明した。
論文参考訳（メタデータ） (Thu, 28 Dec 2023 16:51:11 GMT)
AIエージェントの一貫性（Consistency ）と頑健性（Robustness ）を計測するベンチマークの提案。一貫性、がん形成の定義は「Consistency measures whether the LLMs’ generated human behavior accurately depicts the identity information; Robustness measures whether the generated human behavior will be influenced by the perturbation in the profile.」とのこと
リポジトリはhttps://github.com/GAIR-NLP/GPTMan

コメントを残す

コメントを残す コメントをキャンセル