{"id":7749,"date":"2025-11-10T04:07:00","date_gmt":"2025-11-09T19:07:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7749"},"modified":"2025-11-09T09:13:46","modified_gmt":"2025-11-09T00:13:46","slug":"diffusion-language-models-are-super-data-learners","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7749","title":{"rendered":"Diffusion Language Models are Super Data Learners"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Diffusion Language Models are Super Data Learners\u00a0<\/strong>[61.7]<br>\u30e6\u30cb\u30fc\u30af\u306a\u30c7\u30fc\u30bf\u304c\u9650\u3089\u308c\u3066\u3044\u308b\u5834\u5408\u3001\u62e1\u6563\u8a00\u8a9e\u30e2\u30c7\u30eb(DLM)\u306f\u3001\u3088\u308a\u30a8\u30dd\u30c3\u30af\u306a\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306b\u3088\u3063\u3066\u3001\u5e38\u306b\u81ea\u5df1\u56de\u5e30\u30e2\u30c7\u30eb(AR)\u3092\u4e0a\u56de\u308a\u307e\u3059\u3002 \u672c\u7814\u7a76\u306e\u76ee\u7684\u306f,(1) \u4efb\u610f\u306e\u6b21\u6570\u30e2\u30c7\u30ea\u30f3\u30b0,(2) \u53cd\u5fa9\u7684\u53cc\u65b9\u5411 denoising \u304b\u3089\u306e\u8d85\u9ad8\u6b21\u8a08\u7b97,(3) \u30e2\u30f3\u30c6\u30ab\u30eb\u30ed\u5897\u5206\u3068\u3044\u30463\u3064\u306e\u8907\u5408\u7684\u8981\u56e0\u306b\u8d77\u56e0\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2511.03276v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2511.03276v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 05 Nov 2025 08:17:42 GMT)<\/li>\n\n\n\n<li>\u300cThe main empirical finding is a Crossover: when total training tokens are fixed but the number of unique tokens is limited, DLMs consistently surpass equally sized AR counterparts. This crossover is not an isolated artifact\u2014it systematically shifts with core factors.\u3000With more unique data, it shifts later; with higher data quality, it shifts later; with larger models, the crossover arrives earlier; and it persists across dense and sparse (MoE) architectures (Figures 2, 3, 4). Under compute-bound settings with abundant unique data, AR recovers its edge by fitting the data more rapidly; but in data-bound regimes, which is our focus and, increasingly, the practical reality, DLM is the final winner.\u300d\u3068\u306e\u4e3b\u5f35\u3002<a href=\"https:\/\/devneko.jp\/wordpress\/?p=7206\">Diffusion Beats Autoregressive in Data-Constrained Settings\u00a0 \u2013 arXiv\u6700\u65b0\u8ad6\u6587\u306e\u7d39\u4ecb<\/a>\u306e\u4e3b\u5f35\u3068\u3082\u6574\u5408\u7684\u3067\u3042\u308b\u3088\u3046\u306b\u601d\u3046\u3002<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/jinjieni.notion.site\/Diffusion-Language-Models-are-Super-Data-Learners-239d8f03a866800ab196e49928c019ac\">Diffusion Language Models are Super Data Learners<\/a>\u3001\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/JinjieNi\/dlms-are-super-data-learners\">GitHub &#8211; JinjieNi\/dlms-are-super-data-learners: The official github repo for &#8220;Diffusion Language Models are Super Data Learners&#8221;.<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u540c\u8457\u8005\u306e\u4e0b\u8a18\u8ad6\u6587\u3082\u8208\u5473\u6df1\u3044\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Training Optimal Large Diffusion Language Models\u00a0<\/strong>[61.7]<br>\u62e1\u6563\u8a00\u8a9e\u30e2\u30c7\u30eb(DLM)\u306e\u6700\u521d\u306e\u4f53\u7cfb\u7684\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u6cd5\u5247\u3067\u3042\u308bQuokka\u3092\u7d39\u4ecb\u3059\u308b\u3002 \u3053\u306e\u7d50\u679c\u304c\u3001DLM\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306b\u304a\u3051\u308b\u77ed\u671f\u7684\u306a\u5b9f\u8df5\u7684\u306a\u30ac\u30a4\u30c0\u30f3\u30b9\u3068\u3001AI\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u5168\u4f53\u306e\u9577\u671f\u7684\u306a\u30a4\u30f3\u30b9\u30d4\u30ec\u30fc\u30b7\u30e7\u30f3\u3092\u3082\u305f\u3089\u3059\u3053\u3068\u3092\u671f\u5f85\u3057\u3066\u3044\u307e\u3059\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.03280v2\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.03280v2\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 05 Nov 2025 08:32:08 GMT)<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/JinjieNi\/Quokka\">GitHub &#8211; JinjieNi\/Quokka: The official github repo for &#8220;Training Optimal Large Diffusion Language Models&#8221;, the first-ever large-scale diffusion language models scaling law..<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u540c\u8457\u8005\u306e\u4e0b\u8a18\u8ad6\u6587\u3082\u8208\u5473\u6df1\u3044\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[114,214],"class_list":["post-7749","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-diffusion-model","tag-language-model"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7749","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7749"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7749\/revisions"}],"predecessor-version":[{"id":7750,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7749\/revisions\/7750"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}