{"id":7444,"date":"2025-09-15T03:16:00","date_gmt":"2025-09-14T18:16:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7444"},"modified":"2025-09-14T10:21:59","modified_gmt":"2025-09-14T01:21:59","slug":"humanagencybench-scalable-evaluation-of-human-agency-support-in-ai-assistants","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7444","title":{"rendered":"HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants\u00a0<\/strong>[5.5]<br>\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u54f2\u5b66\u7684\u30fb\u79d1\u5b66\u7684\u7406\u8ad6\u3068AI\u3092\u7528\u3044\u305f\u8a55\u4fa1\u624b\u6cd5\u3092\u7d71\u5408\u3059\u308b\u3053\u3068\u306b\u3088\u308a\u3001\u4eba\u9593\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u8003\u3048\u65b9\u3092\u767a\u5c55\u3055\u305b\u308b\u3002 \u6211\u3005\u306f\u3001\u5178\u578b\u7684\u306aAI\u306e\u30e6\u30fc\u30b9\u30b1\u30fc\u30b9\u306b\u57fa\u3065\u3044\u3066\u30016\u6b21\u5143\u306e\u4eba\u9593\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u6301\u3064\u30b9\u30b1\u30fc\u30e9\u30d6\u30eb\u3067\u9069\u5fdc\u7684\u306a\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u3042\u308bHumanBench(HAB)\u3092\u958b\u767a\u3057\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2509.08494v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2509.08494v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 10 Sep 2025 11:10:10 GMT)<\/li>\n\n\n\n<li>AI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u304c\u4eba\u9593\u306e\u4e3b\u4f53\u6027\u3092\u3069\u306e\u3088\u3046\u306b\u6271\u3046\u304b\u306b\u95a2\u3059\u308b\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3002\u8907\u6570\u306e\u30ab\u30c6\u30b4\u30ea\uff08<a href=\"https:\/\/huggingface.co\/datasets\/Experimental-Orange\/HumanAgencyBench_Evaluation_Results#evaluated-dimensions\">Experimental-Orange\/HumanAgencyBench_Evaluation_Results \u00b7 Datasets at Hugging Face<\/a>\uff09\u306b\u5bfe\u3057\u3066\u8a55\u4fa1\u53ef\u80fd\u3002\u300cThere is substantial variation across model developers\u2014with Anthropic\u2019s Claude models tending to most support human agency\u2014and across dimensions. We encourage further research into human agency as more human tasks and decisions are delegated to AI systems, ensuring humans maintain appropriate levels of control.\u300d\u3068\u30e2\u30c7\u30eb\u306b\u3088\u3063\u3066\u6319\u52d5\u304c\u7570\u306a\u308b\u3088\u3046\u3002<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/BenSturgeon\/HumanAgencyBench\">GitHub &#8211; BenSturgeon\/HumanAgencyBench: A code repository for the paper: &#8220;HUMANAGENCYBENCH: Scalable Evaluation of Human Agency Support in AI Assistants&#8221;<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[517],"class_list":["post-7444","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-517"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7444","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7444"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7444\/revisions"}],"predecessor-version":[{"id":7445,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7444\/revisions\/7445"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}