{"id":8096,"date":"2026-01-23T05:43:00","date_gmt":"2026-01-22T20:43:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=8096"},"modified":"2026-01-18T13:46:52","modified_gmt":"2026-01-18T04:46:52","slug":"when-should-we-introduce-safety-interventions-during-pretraining","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=8096","title":{"rendered":"When Should We Introduce Safety Interventions During Pretraining?"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>When Should We Introduce Safety Interventions During Pretraining?\u00a0<\/strong>[100.4]<br>\u5148\u884c\u7814\u7a76\u306f\u3001\u6709\u5bb3\u306a\u5185\u5bb9\u306e\u8868\u73fe\u306a\u3069\u306e\u4e8b\u524d\u8a13\u7df4\u306e\u4ecb\u5165\u304c\u3001\u7d50\u679c\u306e\u30e2\u30c7\u30eb\u306e\u5b89\u5168\u6027\u3092\u5927\u5e45\u306b\u5411\u4e0a\u3055\u305b\u308b\u3053\u3068\u3092\u793a\u3057\u305f\u3002 \u4ecb\u5165\u306e\u5c0e\u5165\u306f\u4e00\u822c\u7684\u306b\u3001\u904e\u5ea6\u306a\u62d2\u7d76\u7387\u306e\u5897\u52a0\u3092\u4f34\u308f\u306a\u3044\u3001\u3088\u308a\u5805\u7262\u306a\u30e2\u30c7\u30eb\u3092\u3082\u305f\u3089\u3059\u3002 \u307e\u305f\u3001\u3088\u308a\u5b89\u5168\u306a\u4e16\u4ee3\u306b\u5411\u3051\u305f\u30e2\u30c7\u30eb\u306e\u30b9\u30c6\u30a2\u30d3\u30ea\u30c6\u30a3\u306b\u3082\u660e\u3089\u304b\u306a\u30e1\u30ea\u30c3\u30c8\u304c\u3042\u308b\u3068\u8003\u3048\u3066\u3044\u307e\u3059\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2601.07087v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2601.07087v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Sun, 11 Jan 2026 22:38:17 GMT)<\/li>\n\n\n\n<li>\u300cOur experiments show that incorporating safety pretraining interventions indeed help, and the clearest result is that there is much improved robustness after benign finetuning when pretraining interventions are introduced earlier (e g , at 0% or 20% of the pretraining tokens). This also manifests into impacts on the model\u2019s underlying representation geometry; incorporating interventions and metadata earlier in pretraining leads to greater separation of safe vs unsafe content.\u300d\u3068\u306e\u3053\u3068\u3002<\/li>\n\n\n\n<li>\u30bf\u30a4\u30df\u30f3\u30b0\u306b\u3088\u3063\u3066\u7d50\u69cb\u306a\u5dee\u304c\u51fa\u3066\u3044\u308b\u306e\u304c\u610f\u5916\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[108,347],"class_list":["post-8096","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-defense","tag-safety"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/8096","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8096"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/8096\/revisions"}],"predecessor-version":[{"id":8097,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/8096\/revisions\/8097"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8096"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8096"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8096"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}