Diffusion Model – arXiv最新論文の紹介

Diffusion Language Models are Super Data Learners

Diffusion Language Models are Super Data Learners [61.7]
ユニークなデータが限られている場合、拡散言語モデル(DLM)は、よりエポックなトレーニングによって、常に自己回帰モデル(AR)を上回ります。本研究の目的は,(1) 任意の次数モデリング,(2) 反復的双方向 denoising からの超高次計算,(3) モンテカルロ増分という3つの複合的要因に起因する。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:17:42 GMT)
「The main empirical finding is a Crossover: when total training tokens are fixed but the number of unique tokens is limited, DLMs consistently surpass equally sized AR counterparts. This crossover is not an isolated artifact—it systematically shifts with core factors.　With more unique data, it shifts later; with higher data quality, it shifts later; with larger models, the crossover arrives earlier; and it persists across dense and sparse (MoE) architectures (Figures 2, 3, 4). Under compute-bound settings with abundant unique data, AR recovers its edge by fitting the data more rapidly; but in data-bound regimes, which is our focus and, increasingly, the practical reality, DLM is the final winner.」との主張。Diffusion Beats Autoregressive in Data-Constrained Settings – arXiv最新論文の紹介の主張とも整合的であるように思う。
プロジェクトサイトはDiffusion Language Models are Super Data Learners、リポジトリはGitHub – JinjieNi/dlms-are-super-data-learners: The official github repo for “Diffusion Language Models are Super Data Learners”.

同著者の下記論文も興味深い。

Training Optimal Large Diffusion Language Models [61.7]
拡散言語モデル(DLM)の最初の体系的スケーリング法則であるQuokkaを紹介する。この結果が、DLMのトレーニングにおける短期的な実践的なガイダンスと、AIコミュニティ全体の長期的なインスピレーションをもたらすことを期待しています。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 08:32:08 GMT)
リポジトリはGitHub – JinjieNi/Quokka: The official github repo for “Training Optimal Large Diffusion Language Models”, the first-ever large-scale diffusion language models scaling law..

Qwen3-Max, K2-Instruct-0905, LongCat-Flash, Dream-Coder 7B, Kwai Keye-VL 1.5

先週もLLM/LRM界隈のニュースは多かった。Qwen3系最大構成のQwen3 Maxの公開（XユーザーのQwenさん: 「Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters! 🚀 Now available via Qwen Chat & Alibaba Cloud API. Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm: https://t.co/7vQTfHup1Z」 / X、Models and pricing – Alibaba Cloud Model Studio – Alibaba Cloud Documentation Center）、Kimi K2のアップデート（XユーザーのKimi.aiさん: 「Kimi K2-0905 update 🚀 – Enhanced coding capabilities, esp. front-end & tool-calling – Context length extended to 256k tokens – Improved integration with various agent scaffolds (e.g., Claude Code, Roo Code, etc) 🔗 Weights & code: https://t.co/83sQekosr9 💬 Chat with new Kimi https://t.co/mkOuBMwzpw」 / X、moonshotai/Kimi-K2-Instruct-0905 · Hugging Face）やLongCat-Flashの他、Dream-Coder 7B、Kwai Keye-VL 1.5など小規模でもユニークなモデルも発表されている。

Introduction – Agent Client Protocol（GitHub – zed-industries/agent-client-protocol: A protocol for connecting any editor to any agent）といったプロトコルの提案など周辺領域にも目が離せない。

LongCat-Flash Technical Report [165.7]
LongCat-Flashは、560ビリオンパラメータのMixture-of-Experts (MoE)言語モデルである。計算効率と高度なエージェント能力の両方のために設計されている。 30日以内に20兆トークン以上のモデルトレーニングを完了し、100トークン/秒 (TPS) 以上の推論を0.70パーセントのアウトプットトークンで達成しました。
論文参考訳（メタデータ） (Mon, 01 Sep 2025 10:05:45 GMT)
560B MoE構成、「As a non-thinking model, LongCat-Flash achieves performance comparable to state-of-the-art non-thinking models, including DeepSeek-V3.1 [DeepSeek-AI et al , 2025] and Kimi-K2 [Team et al , 2025], while using fewer parameters and offering faster inference speed. Specifically, LongCat-Flash scores 86.5 on ArenaHard-V2, 39.5 on TerminalBench, and 67.7 on τ 2-Bench, demonstrating robust capabilities in general domains, coding, and agentic tool use.」
リポジトリはGitHub – meituan-longcat/LongCat-Flash-Chat

Dream-Coder 7B: An Open Diffusion Language Model for Code [99.1]
そこで,Dream-Coder 7Bを提案する。Dream-Coder 7Bは,任意の順序生成能力を示すコード生成のための,オープンソースの離散拡散言語モデルである。厳密に左から右にデコードする従来の自己回帰(AR)モデルとは異なり、ドリームコーダ7Bはコーディングタスクに基づいてデコード戦略を適応的に決定する。
論文参考訳（メタデータ） (Mon, 01 Sep 2025 05:30:56 GMT)
コーディングタスク強化の拡散モデル
リポジトリはGitHub – DreamLM/Dream-Coder

Kwai Keye-VL 1.5 Technical Report [91.3]
本稿では、ビデオ理解における根本的な課題を3つの重要なイノベーションを通じて解決するKeye-VL-1.5を紹介する。まず,フレーム間の類似性に基づいて動的に計算資源を割り当てるSlow-Fastビデオ符号化方式を提案する。次に,モデルのコンテキスト長を8Kから128Kまで体系的に拡張する4段階事前学習手法を提案する。第3に、推論の強化と人間の嗜好の整合性に焦点を当てた総合的な後学習パイプラインを開発する。
論文参考訳（メタデータ） (Mon, 01 Sep 2025 15:46:58 GMT)
「Keye-VL-1.5-8B establishes new state-of-the-art performance among models of similar scale, demonstrating superior results on video-centric benchmarks while maintaining competitive performance on general multimodal and reasoning tasks.」とビデオを扱えるモデル
リポジトリはGitHub – Kwai-Keye/Keye

Command A Reasoning, DeepSeek V3.1, Gemma 3 270M, Nemotron Nano 2, Dream 7B

LLM/LRM関連の話題は本当に多い。先週はCohere’s Command A Reasoning Model | Cohere（モデルはCohere’s Command A Reasoning Model | Cohere、CC-BY-NC）の公開、DeepSeek V3.1の公開（DeepSeek-V3.1 Release | DeepSeek API Docs、モデルはdeepseek-ai/DeepSeek-V3.1 · Hugging Face）が大きなニュースだった。フロンティアまたはそれに近いモデルが公開される意義は大きい。また、Intern-S1からはテクニカルレポートが公開されている。

小型モデル関連でもGemma 3 270M（Introducing Gemma 3 270M: The compact model for hyper-efficient AI – Google Developers Blog、モデルはgoogle/gemma-3-270m · Hugging Face）は超小型であることが興味深い。性能的には疑問があるとはいえ特化用途にPost trainingするなど使える場面はありそう。NVIDIA のMemtron Nano2も注目である（Nanoという名前で9B）。

HuaweiからはDiffusion系のDream 7Bの論文が出ていた。LLaDAを超え、同規模のAutoregressiveなモデルに負けていなさそうと高い性能。

Intern-S1: A Scientific Multimodal Foundation Model [185.4]
Intern-S1は、一般的な理解と推論機能を備えた専門的なジェネラリストである。 Intern-S1はオフラインおよびオンライン強化学習(RL)をInternBootCampで実施する。 Intern-S1は、オープンソースモデル間の一般的な推論タスクにおける競合性能を示す。
論文参考訳（メタデータ） (Thu, 21 Aug 2025 17:58:00 GMT)
Qwen3-Coder, Intern-S1, Step-Audio2, TeleChat2 – arXiv最新論文の紹介で取り上げたモデルのテクニカルレポート

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model [176.4]
Nemotron-Nano-9B-v2は、推論処理のスループットを向上させるために設計されたハイブリッドのMamba-Transformer言語モデルである。 Nemotron-Nano-9B-v2はNemotron-Hアーキテクチャをベースにしており、共通のTransformerアーキテクチャの自己保持層の大部分をMamba-2層に置き換えている。
論文参考訳（メタデータ） (Thu, 21 Aug 2025 04:18:04 GMT)
nvidia/NVIDIA-Nemotron-Nano-9B-v2 · Hugging Face

Dream 7B: Diffusion Large Language Models [85.3]
これまでで最も強力なオープン拡散大言語モデルであるDream 7Bを紹介します。我々のモデルは、一般的な、数学的、コーディングタスクにおいて、既存の拡散言語モデルよりも一貫して優れています。
論文参考訳（メタデータ） (Thu, 21 Aug 2025 12:09:58 GMT)
「Dream 7B achieves competitive performance with Qwen 2.5 on standard benchmarks (general language understanding, mathematical reasoning, and code generation) while exhibiting superior planning abilities and novel inference flexibility features that naturally emerge from the diffusion modeling paradigm.」とのこと。
リポジトリはGitHub – DreamLM/Dream: Dream 7B, a large diffusion language model、モデルはDream 7B – a Dream-org Collection

Diffusion Models for Time Series Forecasting: A Survey

Diffusion Models for Time Series Forecasting: A Survey [14.3]
拡散モデルは、当初は画像合成のために開発されたが、顕著な生成能力を示している。近年, 時系列予測 (TSF) に応用が拡大され, 有望な結果が得られた。本調査はTSFにおける拡散モデルの最近の進展と今後の展望を詳述し、この分野の研究者の参考となる。
論文参考訳（メタデータ） (Sat, 19 Jul 2025 07:04:04 GMT)
Diffusionモデルの時系列予測への応用に関するサーベイ。
リポジトリはhttps://github.com/synlp/TSF-Diff-Review

Diffusion Beats Autoregressive in Data-Constrained Settings

Diffusion Beats Autoregressive in Data-Constrained Settings [46.1]
自己回帰(AR)モデルは長い間、大きな言語モデルのランドスケープを支配してきた。近年,ARモデルよりもアドバンテージが低いものの,拡散型言語モデルが将来性のある選択肢として浮上している。
論文参考訳（メタデータ） (Mon, 21 Jul 2025 17:59:57 GMT)
「In this paper, we systematically study masked diffusion models in data-constrained settings—where training involves repeated passes over limited data—and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior down- stream performance.」という指摘。直観的にもそうだろうと思う。
リポジトリはDiffusion Beats Autoregressive in Data-Constrained Settings

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs [39.9]
DLLMのユニークな安全性の弱点を生かした、最初の系統的な研究および脱獄攻撃フレームワークであるDIJAを提案する。提案するDIJAは,dLLMのテキスト生成機構を利用した対向的インターリーブ・マスクテキストプロンプトを構築する。本研究は, 新たな言語モデルにおいて, 安全アライメントの再考の必要性を浮き彫りにするものである。
論文参考訳（メタデータ） (Tue, 15 Jul 2025 08:44:46 GMT)
dLLMに対する攻撃手法の提案。「By interleaving sets of [MASK] tokens after vanilla malicious prompt, as shown in Figure 2, a dLLM is coerced into generating harmful instructions purely to maintain contextual consistency. Moreover, in contrast to autoregressive LLMs, which generate tokens sequentially and can perform on-the-fly rejection of unsafe continuations, dLLMs decode masked tokens in parallel at each step, substantially limiting the model’s ability to conduct dynamic risk assessment or intervene during generation (e g , reject sampling for tokens corresponding to harmful contents). Consequently, defenses designed for left-to-right models break down, opening the door to powerful new jailbreak attacks.」とある通り、CausalLMとは別体系であるモデルの特徴を利用した攻撃手法となっていて、攻撃成功率も高い。
リポジトリはGitHub – ZichenWen1/DIJA: code for “The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs”

Mercury: Ultra-Fast Language Models Based on Diffusion

Mercury: Ultra-Fast Language Models Based on Diffusion [58.5]
拡散に基づく新しい商用大規模言語モデル(LLM)であるMercuryを提示する。 Mercury CoderにはMiniとSmallの2つのサイズがある。独立した評価に基づいて、マーキュリー・コーダ・ミニとマーキュリー・コーダ・スモールは、それぞれ1109トークン/秒と737トークン/秒の最先端のスループットを達成した。
論文参考訳（メタデータ） (Tue, 17 Jun 2025 17:06:18 GMT)
Continuous Diffusion Model for Language Modeling, Energy-Based Diffusion Language Models for Text Generation – arXiv最新論文の紹介で少しだけ取り上げたMercuryに関する論文
サイトはInception Platform

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.2]
拡散大言語モデル(dLLM)は自己回帰(AR)モデルの魅力的な代替品である。本研究は,それらの認知過程と強化学習手法について考察する。我々の研究は、dLLM生成のメカニズムについて深い洞察を与え、効果的な拡散ネイティブなRLトレーニングフレームワークを提供します。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 15:46:40 GMT)
ARモデルとの挙動の差が興味深い論文。「Reinforcement learning (RL) and GRPO (Shao et al , 2024) have proven critical for enhancing AR models (Bercovich et al , 2025; Shao et al , 2025), but their application to dLLMs is less explored.」としたうえでDiffusion model用のCoupled-GRPOを提案。
リポジトリはhttps://github.com/apple/ml-diffucoder

Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture

Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture [65.9]
自己回帰(AR)モデルの代替として、仮面拡散モデル(MDM)が登場している。 ARモデルはデコーダのみであることが多いが、MDMはエンコーダのみである。本研究は,デコーダのみのフレームワークにおけるMDMを評価した。 MDM内でアーキテクチャの影響(デコーダのみ対エンコーダのみ)を調査する。
論文参考訳（メタデータ） (Tue, 24 Jun 2025 18:22:25 GMT)
AutoRegressive (AR) と Masked Diffusion Models (MDMs)の比較評価。
リポジトリはGitHub – scxue/AO-GPT-MDM: Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Discrete Diffusion in Large Language and Multimodal Models: A Survey [56.3]
離散拡散言語モデル(dLLM)と離散拡散多モード言語モデル(dMLLM)の体系的調査を提供する。自己回帰(AR)モデルとは異なり、dLLMとdMLLMはマルチトークンの並列デコードパラダイムを採用している。我々は、dLLMとdMLLMの歴史的発展を辿り、基礎となる数学的枠組みを定式化し、代表モデルを分類する。
論文参考訳（メタデータ） (Mon, 16 Jun 2025 17:59:08 GMT)
Discrete Diffusion Language Models (dLLMs) とDiscrete Diffusion Multimodal Language Modelsのサーベイ
全盛のAutoregressiveモデルとの関係・差異が興味深い。

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30