Diffusion Language Models are Super Data Learners

Diffusion Language Models are Super Data Learners [61.7]
ユニークなデータが限られている場合、拡散言語モデル(DLM)は、よりエポックなトレーニングによって、常に自己回帰モデル(AR)を上回ります。本研究の目的は,(1) 任意の次数モデリング,(2) 反復的双方向 denoising からの超高次計算,(3) モンテカルロ増分という3つの複合的要因に起因する。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:17:42 GMT)
「The main empirical finding is a Crossover: when total training tokens are fixed but the number of unique tokens is limited, DLMs consistently surpass equally sized AR counterparts. This crossover is not an isolated artifact—it systematically shifts with core factors.　With more unique data, it shifts later; with higher data quality, it shifts later; with larger models, the crossover arrives earlier; and it persists across dense and sparse (MoE) architectures (Figures 2, 3, 4). Under compute-bound settings with abundant unique data, AR recovers its edge by fitting the data more rapidly; but in data-bound regimes, which is our focus and, increasingly, the practical reality, DLM is the final winner.」との主張。Diffusion Beats Autoregressive in Data-Constrained Settings – arXiv最新論文の紹介の主張とも整合的であるように思う。
プロジェクトサイトはDiffusion Language Models are Super Data Learners、リポジトリはGitHub – JinjieNi/dlms-are-super-data-learners: The official github repo for “Diffusion Language Models are Super Data Learners”.

同著者の下記論文も興味深い。

Training Optimal Large Diffusion Language Models [61.7]
拡散言語モデル(DLM)の最初の体系的スケーリング法則であるQuokkaを紹介する。この結果が、DLMのトレーニングにおける短期的な実践的なガイダンスと、AIコミュニティ全体の長期的なインスピレーションをもたらすことを期待しています。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:32:08 GMT)
リポジトリはGitHub – JinjieNi/Quokka: The official github repo for “Training Optimal Large Diffusion Language Models”, the first-ever large-scale diffusion language models scaling law..

月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル