{"id":7437,"date":"2025-09-23T06:35:00","date_gmt":"2025-09-22T21:35:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7437"},"modified":"2025-09-14T09:37:28","modified_gmt":"2025-09-14T00:37:28","slug":"a-survey-of-reinforcement-learning-for-large-reasoning-models","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7437","title":{"rendered":"A Survey of Reinforcement Learning for Large Reasoning Models"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>A Survey of Reinforcement Learning for Large Reasoning Models\u00a0<\/strong>[98.6]<br>\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb\u306b\u3088\u308b\u63a8\u8ad6\u306e\u305f\u3081\u306e\u5f37\u5316\u5b66\u7fd2\u306e\u6700\u8fd1\u306e\u9032\u6b69\u306b\u3064\u3044\u3066 LRM\u306e\u305f\u3081\u306eRL\u306e\u3055\u3089\u306a\u308b\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u306f\u3001\u8a08\u7b97\u8cc7\u6e90\u3060\u3051\u3067\u306a\u304f\u3001\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u8a2d\u8a08\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30c7\u30fc\u30bf\u3001\u30a4\u30f3\u30d5\u30e9\u306b\u304a\u3044\u3066\u3082\u8ab2\u984c\u306b\u76f4\u9762\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2509.08827v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2509.08827v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 10 Sep 2025 17:59:43 GMT)<\/li>\n\n\n\n<li>LRM\u306e\u305f\u3081\u306e\u5f37\u5316\u5b66\u7fd2\u306b\u95a2\u3059\u308b\u30b5\u30fc\u30d9\u30a4\u3060\u304c\u3001\u300cTo this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area.\u300d\u3068\u76ee\u7684\u306bASI\u3068\u306f\u8a00\u3063\u3066\u3044\u308b\u306e\u304c\u8208\u5473\u6df1\u3044\u3002<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/TsinghuaC3I\/Awesome-RL-for-LRMs\">GitHub &#8211; TsinghuaC3I\/Awesome-RL-for-LRMs: A Survey of Reinforcement Learning for Large Reasoning Models<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[232,387],"class_list":["post-7437","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-lrm","tag-survey"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7437"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7437\/revisions"}],"predecessor-version":[{"id":7438,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7437\/revisions\/7438"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}