{"id":3994,"date":"2023-10-31T05:33:00","date_gmt":"2023-10-30T20:33:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=3994"},"modified":"2023-10-31T05:33:00","modified_gmt":"2023-10-30T20:33:00","slug":"english-benchmark-for-stress-testing-machine-tom","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=3994","title":{"rendered":"English benchmark for stress-testing machine ToM"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions&nbsp;<\/strong>[94.6]<br>\u73fe\u5728\u3001\u30de\u30a4\u30f3\u30c9\u8a55\u4fa1\u306e\u7406\u8ad6\u306f\u3001\u672c\u8cea\u7684\u306b\u76f8\u4e92\u4f5c\u7528\u6027\u306b\u6b20\u3051\u308b\u53d7\u52d5\u7684\u7269\u8a9e\u3092\u7528\u3044\u305f\u30c6\u30b9\u30c8\u30e2\u30c7\u30eb\u306b\u7126\u70b9\u3092\u5f53\u3066\u3066\u3044\u308b\u3002 \u672c\u7a3f\u3067\u306f,\u60c5\u5831\u975e\u5bfe\u79f0\u306a\u4f1a\u8a71\u6587\u8108\u306b\u304a\u3051\u308bToM\u306e\u30b9\u30c8\u30ec\u30b9\u30c6\u30b9\u30c8\u3092\u76ee\u7684\u3068\u3057\u305f\u65b0\u3057\u3044\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u3042\u308bFANToM\u3092\u7d39\u4ecb\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2310.15421v2\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2310.15421v2\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Wed, 25 Oct 2023 06:46:42 GMT)<\/li>\n\n\n\n<li>Theory of Mind\u306e\u30d9\u30f3\u30c1\u30de\u30fc\u30afFANToM\u306b\u95a2\u3059\u308b\u8ad6\u6587\u3002\u3069\u3046\u3067\u3082\u3088\u3044\u304c\u3001English benchmark <strong>f<\/strong>or stress-testing m<strong>a<\/strong>chi<strong>n<\/strong>e <strong>ToM<\/strong> \u3068\u3044\u3046\u7565\u79f0\u306e\u4f5c\u308a\u65b9\u306f\u7121\u7406\u7b4b\u306a\u306e\u3067\u306f\u30fb\u30fb\u30fb<\/li>\n\n\n\n<li>\u300cWe show that FANTOM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chainof-thought reasoning or fine-tuning.\u300d\u3068\u306e\u3053\u3068\u3067\u96e3\u3057\u3044\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3068\u306e\u3053\u3068\u3002\u300cWe do not believe that current LLMs possess an actual ToM.\u300d\u3068\u3044\u3046\u6ce8\u91c8\u3082\u8208\u5473\u6df1\u3044\u3002LLM\u306e\u30b9\u30b3\u30a2\u306f\u4eba\u9593\u306e\u30b9\u30b3\u30a2\u3088\u308a\u3082\u8457\u3057\u304f\u60aa\u304f\u3001\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u3067\u306f\u300cLLMs do not have a coherent theory of mind\u300d\u3068\u66f8\u304b\u308c\u3066\u3044\u308b\u3002<\/li>\n\n\n\n<li>\u793e\u4f1a\u7684\u30fb\u502b\u7406\u7684\u8003\u5bdf\u3067\u306f\u300cWhile the concept of ToM attempts to capture the ability to attribute mental states to oneself and others (Premack and Woodruff, 1978), it is important to clarify that AI models do not possess subjective consciousness or true understanding of intentions, beliefs, or desires. Our experiment results also demonstrate that current large language models do not exhibit any coherent ToM reasoning; instead, they primarily rely on word correlations.\u300d\u3068\u306e\u3053\u3068\u3067\u3001\u5358\u8a9e\u306e\u76f8\u95a2\u95a2\u4fc2\u306e\u307f\u3067\u4f55\u304b\u304c\u3042\u308b\u3088\u3046\u306b\u898b\u3048\u3066\u3044\u308b\u3060\u3051\u306a\u306e\u3067\u306f\uff1f\u3068\u3044\u3046\u306e\u304c\u4e00\u756a\u3042\u308a\u305d\u3046\u3002\uff08\u4eba\u9593\u306f\u3069\u3046\u306a\u3093\u3060\uff1f\u3068\u3044\u3046\u8a71\u3082\u3042\u308a\u3001\u8b70\u8ad6\u304c\u767a\u6563\u3057\u3066\u3044\u304d\u305d\u3046\u306a\u9818\u57df\u3067\u3082\u3042\u308b\uff09<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/hyunw.kim\/fantom\/\">FANToM: A New Benchmark for Machine ToM in Interactions (hyunw.kim)<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[223,404],"class_list":["post-3994","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-llm","tag-theory-of-mind"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/3994","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3994"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/3994\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3994"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3994"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3994"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}