{"id":7784,"date":"2025-11-20T06:15:00","date_gmt":"2025-11-19T21:15:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7784"},"modified":"2025-11-15T20:20:43","modified_gmt":"2025-11-15T11:20:43","slug":"training-language-models-to-explain-their-own-computations","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7784","title":{"rendered":"Training Language Models to Explain Their Own Computations\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Training Language Models to Explain Their Own Computations\u00a0<\/strong>[73.9]<br>\u672c\u7814\u7a76\u3067\u306f,LM\u306e\u81ea\u5df1\u5185\u90e8\u3078\u306e\u7279\u6a29\u7684\u30a2\u30af\u30bb\u30b9\u3092\u3069\u306e\u7a0b\u5ea6\u6d3b\u7528\u3067\u304d\u308b\u304b\u3092\u8003\u5bdf\u3057,\u305d\u306e\u632f\u308b\u821e\u3044\u3092\u8aac\u660e\u3059\u308b\u305f\u3081\u306e\u65b0\u3057\u3044\u624b\u6cd5\u3092\u63d0\u6848\u3059\u308b\u3002 \u65e2\u5b58\u306e\u89e3\u91c8\u53ef\u80fd\u6027\u6280\u8853\u3092\u7528\u3044\u3066,(1)LM\u7279\u5fb4\u306b\u3088\u3063\u3066\u7b26\u53f7\u5316\u3055\u308c\u305f\u60c5\u5831,(2)LM\u306e\u5185\u90e8\u30a2\u30af\u30c6\u30a3\u30d9\u30fc\u30b7\u30e7\u30f3\u306e\u56e0\u679c\u69cb\u9020,(3)\u7279\u5b9a\u306e\u5165\u529b\u30c8\u30fc\u30af\u30f3\u304cLM\u51fa\u529b\u306b\u4e0e\u3048\u308b\u5f71\u97ff\u306e\u81ea\u7136\u8a00\u8a9e\u8a18\u8ff0\u3092\u751f\u6210\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2511.08579v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2511.08579v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 12 Nov 2025 02:05:44 GMT)<\/li>\n\n\n\n<li>\u300cTaken together, these results suggest that even when language models cannot faithfully self-explain as a result of ordinary training, they can learn to do so through an objective that enforces consistency between their external explanations and their internal procedures. This reframes interpretation as not only an external analysis problem, but as a capability that can be trained into LMs themeselves; by leveraging privileged access to internal computations, \u201cintrospective interpretability\u201d techniques offer an avenue towards scalable understanding of model behavior.\u300d\u3068\u975e\u5e38\u306b\u8208\u5473\u6df1\u3044\u7814\u7a76<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[452],"class_list":["post-7784","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-xai"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7784","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7784"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7784\/revisions"}],"predecessor-version":[{"id":7785,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7784\/revisions\/7785"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7784"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7784"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7784"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}