{"id":7721,"date":"2025-11-04T03:31:00","date_gmt":"2025-11-03T18:31:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7721"},"modified":"2025-11-02T13:50:14","modified_gmt":"2025-11-02T04:50:14","slug":"co-evolving-latent-action-world-models-spice-self-play-in-corpus-environments-improves-reasoning-critique-rl-parrot","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7721","title":{"rendered":"Co-Evolving Latent Action World Models, SPICE : Self-Play In Corpus Environments Improves Reasoning, Critique-RL, Parrot"},"content":{"rendered":"\n<p>\u5148\u9031\u30012\u3064\u306e\u7570\u306a\u308b\u3082\u306e\u3092\u5171\u306b\u9032\u5316\u3055\u305b\u6027\u80fd\u5411\u4e0a\u3092\u56f3\u308b\u8ad6\u6587\u304c\u8907\u6570\u51fa\u3066\u3044\u305f\u3002\u3053\u306e\u3088\u3046\u306a\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3068\u3057\u3066\u306fGAN\u304c\u6709\u540d\u3067\u306f\u3042\u308b\u304c\u3001LLM based\u306a\u6642\u4ee3\u3067\u3082\u3057\u3070\u3057\u3070\u898b\u308b\u30a2\u30d7\u30ed\u30fc\u30c1\u3067\u975e\u5e38\u306b\u8208\u5473\u6df1\u3044\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Co-Evolving Latent Action World Models\u00a0<\/strong>[57.5]<br>\u5b66\u7fd2\u6e08\u307f\u306e\u30d3\u30c7\u30aa\u30e2\u30c7\u30eb\u3092\u6f5c\u5728\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u4ecb\u3057\u3066\u5236\u5fa1\u53ef\u80fd\u306a\u4e16\u754c\u30e2\u30c7\u30eb\u306b\u9069\u5fdc\u3055\u305b\u308b\u3053\u3068\u306f\u3001\u30b8\u30a7\u30cd\u30e9\u30ea\u30b9\u30c8\u306e\u4e16\u754c\u30e2\u30c7\u30eb\u3092\u4f5c\u6210\u3059\u308b\u305f\u3081\u306e\u6709\u671b\u306a\u30b9\u30c6\u30c3\u30d7\u3067\u3042\u308b\u3002 \u672c\u7a3f\u3067\u306f,\u3053\u306e\u76f8\u4e57\u7684\u30d1\u30e9\u30c0\u30a4\u30e0\u3092\u521d\u3081\u3066\u5b9f\u73fe\u3057\u305fCoLA-World\u3092\u63d0\u6848\u3059\u308b\u3002 \u4e16\u754c\u30e2\u30c7\u30eb\u306f\u77e5\u8b58\u306e\u3042\u308b\u5bb6\u5ead\u6559\u5e2b\u3068\u3057\u3066\u6a5f\u80fd\u3057\u3001\u9ad8\u54c1\u8cea\u306eLAM\u3092\u5f62\u6210\u3059\u308b\u305f\u3081\u306e\u52fe\u914d\u3092\u63d0\u4f9b\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.26433v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.26433v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 30 Oct 2025 12:28:40 GMT)<\/li>\n\n\n\n<li>\u300cWe propose CoLA-World, the first framework that successfully enables joint training of a latent action model with a pre-trained video-generation-based world model.\u300d\u3068latent action model (LAM) \u3068 world model\u3092\u5171\u306b\u751f\u6210<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SPICE: Self-Play In Corpus Environments Improves Reasoning\u00a0<\/strong>[58.8]<br>SPICE\u306f\u3001\u5358\u4e00\u306e\u30e2\u30c7\u30eb\u304c2\u3064\u306e\u5f79\u5272\u3067\u6a5f\u80fd\u3059\u308b\u5f37\u5316\u5b66\u7fd2\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3042\u308b\u3002 \u30c1\u30e3\u30ec\u30f3\u30b8\u30e3\u30fc\u306f\u3001\u69d8\u3005\u306a\u63a8\u8ad6\u30bf\u30b9\u30af\u3092\u751f\u6210\u3059\u308b\u305f\u3081\u306b\u3001\u5927\u304d\u306a\u30b3\u30fc\u30d1\u30b9\u304b\u3089\u6587\u66f8\u3092\u30de\u30a4\u30cb\u30f3\u30b0\u3059\u308b\u3002 \u672c\u5206\u6790\u306f,SPICE\u306b\u304a\u3051\u308b\u6587\u66f8\u306e\u57fa\u76e4\u5316\u304c,\u307e\u3059\u307e\u3059\u56f0\u96e3\u306a\u76ee\u6a19\u3092\u9023\u7d9a\u7684\u306b\u751f\u307f\u51fa\u3059\u4e0a\u3067,\u3044\u304b\u306b\u91cd\u8981\u306a\u8981\u7d20\u3067\u3042\u308b\u304b\u3092\u660e\u3089\u304b\u306b\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.24684v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.24684v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 28 Oct 2025 17:46:16 GMT)<\/li>\n\n\n\n<li>\u300cSPICE is a self-play framework where a single LLM, \u03c0\u03b8, acts in two roles: a Challenger (role = C), which poses difficult questions, and a Reasoner (role = R), which tries to correctly answer such questions. The Challenger uses a raw document (which does not contain existing questions or labels) from a corpus to generate a (q, a\u2217) pair.\u300d\u3068Challenger\u3068Reasoner\u3092\u4f7f\u3046\u5f37\u5316\u5b66\u7fd2\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning\u00a0<\/strong>[89.6]<br>\u3088\u308a\u5f37\u529b\u306a\u76e3\u7763\u3092\u4f34\u308f\u306a\u3044\u30af\u30aa\u30ea\u30c6\u30a3\u30af\u8a00\u8a9e\u30e2\u30c7\u30eb\u3092\u958b\u767a\u3059\u308b\u305f\u3081\u306e\u30aa\u30f3\u30e9\u30a4\u30f3RL\u30a2\u30d7\u30ed\u30fc\u30c1\u3067\u3042\u308bCrytique-RL\u3092\u63d0\u6848\u3059\u308b\u3002 \u63d0\u6848\u624b\u6cd5\u306f,\u30a2\u30af\u30bf\u30fc\u304c\u5fdc\u7b54\u3092\u751f\u6210\u3057,\u6279\u8a55\u5bb6\u304c\u30d5\u30a3\u30fc\u30c9\u30d0\u30c3\u30af\u3092\u63d0\u4f9b\u3057,\u30a2\u30af\u30bf\u30fc\u304c\u305d\u308c\u306b\u5fdc\u3058\u3066\u5fdc\u7b54\u3092\u6d17\u7df4\u3059\u308b,\u3068\u3044\u30462\u6bb5\u968e\u306e\u30d1\u30e9\u30c0\u30a4\u30e0\u306b\u57fa\u3065\u3044\u3066\u3044\u308b\u3002 \u3055\u307e\u3056\u307e\u306a\u30bf\u30b9\u30af\u3084\u30e2\u30c7\u30eb\u306b\u5bfe\u3059\u308b\u5b9f\u9a13\u3067\u306f\u3001Cristique-RL\u304c\u5927\u5e45\u306a\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u6539\u5584\u3092\u5b9f\u73fe\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.24320v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.24320v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 28 Oct 2025 11:37:01 GMT)<\/li>\n\n\n\n<li>\u300cIn stage I, it reinforces the discriminability of the critic with direct rule-based reward signals; in stage II, it introduces indirect rewards based on actor refinement to improve the critic\u2019s helpfulness, while maintaining its discriminability via appropriate regularization. Extensive experiments across various tasks and models show that Critique-RL delivers substantial performance improvements.\u300d\u3068\uff12\u30b9\u30c6\u30fc\u30b8\u69cb\u6210\u306e\u6279\u8a55\u5bb6\u30e2\u30c7\u30eb\u306e\u5f37\u5316\uff08Actor\u5074\u306f\u66f4\u65b0\u3055\u308c\u306a\u3044\u306e\u3067\u4ed6\u3068\u306f\u7570\u306a\u308b\u304c\uff09<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/WooooDyy\/Critique-RL\">GitHub &#8211; WooooDyy\/Critique-RL<\/a><\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning\u00a0<\/strong>[69.0]<br>\u81ea\u7136\u8a00\u8a9e\u306e\u30c1\u30a7\u30fc\u30f3\u30fb\u30aa\u30d6\u30fb\u30b7\u30f3\u30c8(N-CoT)\u3068\u30d7\u30ed\u30b0\u30e9\u30e0\u30fb\u30c1\u30a7\u30fc\u30f3\u30fb\u30aa\u30d6\u30fb\u30b7\u30f3\u30c8(P-CoT)\u306f\u3001\u6570\u5b66\u7684\u306a\u63a8\u8ad6\u554f\u984c\u3092\u89e3\u6c7a\u3059\u308b\u305f\u3081\u306b\u3001\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb(LLM)\u306e2\u3064\u306e\u4e3b\u8981\u306a\u30d1\u30e9\u30c0\u30a4\u30e0\u3068\u3057\u3066\u767b\u5834\u3057\u305f\u3002 \u6570\u5b66\u7684\u554f\u984c\u306b\u5bfe\u3059\u308b\u65b0\u3057\u3044\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30d1\u30a4\u30d7\u30e9\u30a4\u30f3\u3067\u3042\u308bParrot\u3092\u63d0\u6848\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.25310v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.25310v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 29 Oct 2025 09:23:17 GMT)<\/li>\n\n\n\n<li>Natural language chain-of-thought (N-CoT) \u3068Program chain-of-thought (P-CoT)\u306e\u4e21\u5f37\u5316\u3001\u300cThe pipeline comprises three target-designed subtasks: Information Retrieval trains the model to concentrate on key information within problem. P-CoT Reasoning utilizes the information to generate variable well- defined code solutions. Paradigm Conversion enhances N-CoT with concise P-CoT and its intermediate outputs.\u300d\u306e3\u30b5\u30d6\u30bf\u30b9\u30af\u3092\u524d\u63d0\u3068\u3057\u3066\u3044\u308b\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u5148\u9031\u30012\u3064\u306e\u7570\u306a\u308b\u3082\u306e\u3092\u5171\u306b\u9032\u5316\u3055\u305b\u6027\u80fd\u5411\u4e0a\u3092\u56f3\u308b\u8ad6\u6587\u304c\u8907\u6570\u51fa\u3066\u3044\u305f\u3002\u3053\u306e\u3088\u3046\u306a\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3068\u3057\u3066\u306fGAN\u304c\u6709\u540d\u3067\u306f\u3042\u308b\u304c\u3001LLM based\u306a\u6642\u4ee3\u3067\u3082\u3057\u3070\u3057\u3070\u898b\u308b\u30a2\u30d7\u30ed\u30fc\u30c1\u3067\u975e\u5e38\u306b\u8208\u5473\u6df1\u3044\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[84,356,449],"class_list":["post-7721","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-critic","tag-self-x","tag-world-model"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7721"}],"version-history":[{"count":2,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7721\/revisions"}],"predecessor-version":[{"id":7723,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7721\/revisions\/7723"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}