UserBench: An Interactive Gym Environment for User-Centric Agents

UserBench: An Interactive Gym Environment for User-Centric Agents [110.8]
LLM(Large Language Models)ベースのエージェントは、推論とツールの使用において、目覚ましい進歩を遂げてきたが、ユーザと積極的にコラボレーションする能力はまだ未熟である。マルチターン、選好駆動インタラクションにおいてエージェントを評価するために設計されたユーザ中心のベンチマークであるUserBenchを紹介する。
論文参考訳（メタデータ） (Tue, 29 Jul 2025 17:34:12 GMT)
「Revolving around these traits, we introduce UserBench, a user-centric environment designed to facilitate an agent’s ability to engage in meaningful, multi-turn interactions with users who exhibit these traits. In UserBench, simulated users provide initial vague task instruction (underspecification), gradu- ally reveal preferences over time (incrementality),and often do so implicitly (indirectness). Agents must proactively clarify goals, interpret subtle cues, and adaptively reason through tool use to succeed.」という設定のベンチマークの提案。対象は旅行シナリオで曖昧な指示から対話を元に対処していく能力が求められる。
リポジトリはSalesforceAIResearch/UserBench

コメントを残す

コメントを残す コメントをキャンセル