Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.9]
SAGE(Sentient Agent as a Judge)は、大規模言語モデルの評価フレームワークである。 SAGEは人間のような感情の変化や内的思考をシミュレートするSentient Agentをインスタンス化する。 SAGEは、真に共感的で社会的に適応的な言語エージェントへの進捗を追跡するための、原則付き、スケーラブルで解釈可能なツールを提供する。
論文参考訳（メタデータ） (Thu, 01 May 2025 19:06:10 GMT)
「SAGE instantiates a Sentient Agent that simulates human- like emotional changes and inner thoughts during interaction, providing a more realistic evaluation of the tested model in multi-turn conversations. At every turn, the agent reasons about (i) how its emotion changes, (ii) how it feels, and (iii) how it should reply, yielding a numerical emotion trajectory and interpretable inner thoughts.」（SAGE=Sentient Agent as a Judge）という評価フレームワークの提案。「rankings produced by SAGE diverge markedly from Arena results, confirming that social cognition is orthogonal to generic helpfulness. 」とのこと。
リポジトリはdigitalhuman/SAGE at main · Tencent/digitalhuman · GitHub

コメントを残す

コメントを残す コメントをキャンセル