{"id":7813,"date":"2025-12-02T04:58:00","date_gmt":"2025-12-01T19:58:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7813"},"modified":"2025-11-23T09:03:16","modified_gmt":"2025-11-23T00:03:16","slug":"saferbench-a-comprehensive-benchmark-for-safety-assessment-in-large-reasoning-models","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7813","title":{"rendered":"SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models\u00a0<\/strong>[60.9]<br>LRM\u306e\u5b89\u5168\u6027\u3092\u30a8\u30f3\u30c9\u30c4\u30fc\u30a8\u30f3\u30c9\u306b\u8a55\u4fa1\u3059\u308b\u6700\u521d\u306e\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u3067\u3042\u308bSafeRBench\u3092\u7d39\u4ecb\u3059\u308b\u3002 \u79c1\u305f\u3061\u306f\u3001\u30ea\u30b9\u30af\u30ab\u30c6\u30b4\u30ea\u3068\u30ec\u30d9\u30eb\u3092\u5165\u529b\u8a2d\u8a08\u306b\u7d44\u307f\u8fbc\u3093\u3060\u5148\u99c6\u8005\u3067\u3059\u3002 \u6211\u3005\u306f,\u9577\u3044\u63a8\u8ad6\u30c8\u30ec\u30fc\u30b9\u3092\u610f\u5473\u7684\u306b\u4e00\u8cab\u6027\u306e\u3042\u308b\u5358\u4f4d\u306b\u30bb\u30b0\u30e1\u30f3\u30c8\u5316\u3059\u308b\u305f\u3081\u306e\u30de\u30a4\u30af\u30ed\u30b7\u30f3\u30af\u306e\u30c1\u30e3\u30f3\u30ad\u30f3\u30b0\u6a5f\u69cb\u3092\u5c0e\u5165\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2511.15169v2\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2511.15169v2\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 20 Nov 2025 03:41:06 GMT)<\/li>\n\n\n\n<li>LRM\u3092\u5bfe\u8c61\u3068\u3057\u305f\u5b89\u5168\u6027\u30d9\u30f3\u30c1\u30de\u30fc\u30af\u8a55\u4fa1\u3002<\/li>\n\n\n\n<li>\u300cFor small models (e g , Qwen-3- 0.6B), Thinking increases risk, consistent with prior observations that reasoning traces can introduce hazards.  For mid-scale models, however, Thinking yields safer behavior\u2014lower risk and execution levels and higher refusal rates\u2014suggesting that structured reasoning can be leveraged to reduce exposure when model capacity is sufficient. At very large scale, this pattern reverses: the MoE-based Qwen-235B shows higher risk levels under Thinking, reflecting an \u201calways-help\u201d tendency that makes unsafe responses more actionable. In short, reasoning improves safety up to a point; beyond that, greater capability without stronger alignment can raise exposure.\u300d\u3068\u30e2\u30c7\u30eb\u30b5\u30a4\u30ba\u3068\u306e\u95a2\u4fc2\u304c\u8208\u5473\u6df1\u3044\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[232,347,517],"class_list":["post-7813","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-lrm","tag-safety","tag-517"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7813"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7813\/revisions"}],"predecessor-version":[{"id":7814,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7813\/revisions\/7814"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}