{"id":7446,"date":"2025-09-16T04:22:00","date_gmt":"2025-09-15T19:22:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7446"},"modified":"2025-09-14T10:25:13","modified_gmt":"2025-09-14T01:25:13","slug":"language-self-play-for-data-free-training","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7446","title":{"rendered":"Language Self-Play For Data-Free Training\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Language Self-Play For Data-Free Training\u00a0<\/strong>[37.2]<br>\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb(LLM)\u306f,\u8fd1\u5e74,\u5927\u898f\u6a21,\u9ad8\u54c1\u8cea\u306a\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30c7\u30fc\u30bf,\u5f37\u5316\u5b66\u7fd2\u306b\u3088\u3063\u3066\u6025\u901f\u306b\u9032\u6b69\u3057\u3066\u3044\u308b\u3002 \u3057\u304b\u3057\u3001\u3053\u306e\u9032\u6b69\u306f\u6839\u672c\u7684\u306a\u30dc\u30c8\u30eb\u30cd\u30c3\u30af\u306b\u76f4\u9762\u3057\u3066\u3044\u308b\u3002 \u6211\u3005\u306f\u3001\u8ffd\u52a0\u30c7\u30fc\u30bf\u306a\u3057\u3067\u30e2\u30c7\u30eb\u306e\u6539\u5584\u3092\u53ef\u80fd\u306b\u3059\u308b\u3053\u3068\u3067\u3001\u3053\u306e\u4f9d\u5b58\u3092\u53d6\u308a\u9664\u304f\u5f37\u5316\u5b66\u7fd2\u624b\u6cd5\u3092\u63d0\u6848\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2509.07414v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2509.07414v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Tue, 09 Sep 2025 05:51:34 GMT)<\/li>\n\n\n\n<li>\u300cLanguage Self-Play agent operates under two modes: Challenger and Solver. Challenger generates instructions that Solver follows. While Solver learns to improve its responses to the prompts, Challenger learns to make them more difficult. Both modes are instantiated by one model and thus enable perpetual training on increasingly higher-quality self-generated data.\u300d\u3068\u3044\u3046Language Self-Play (LSP)\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u306e\u63d0\u6848\u3002<\/li>\n\n\n\n<li><a href=\"https:\/\/devneko.jp\/wordpress\/?p=7256\">R-Zero: Self-Evolving Reasoning LLM from Zero Data \u2013 arXiv\u6700\u65b0\u8ad6\u6587\u306e\u7d39\u4ecb<\/a>\u306b\u4f3c\u3066\u3044\u308b\uff1f<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[390],"class_list":["post-7446","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-synthetic-data"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7446","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7446"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7446\/revisions"}],"predecessor-version":[{"id":7447,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7446\/revisions\/7447"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7446"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7446"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7446"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}