TiDAR: Think in Diffusion, Talk in Autoregression [59.9] TiDARは、Diffusionでトークン(Thinking)をドラフトし、最終的な出力(Talking)をAutoRegressivelyにサンプリングするシーケンスレベルのハイブリッドアーキテクチャである。 TiDARはARモデルと品質ギャップを埋める最初のアーキテクチャであり、毎秒4.71倍から5.91倍のトークンを提供する。 論文参考訳(メタデータ) (Thu, 13 Nov 2025 01:18:11 GMT)
Diffusion modelとAuto regressiveのハイブリッド「We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively – all within a single forward pass using specially designed structured attention masks.」
「We extensively evaluate TiDAR against AR models, speculative decoding, and diffusion variants across generative and likelihood tasks at 1.5B and 8B scales. Thanks to the parallel drafting and sampling as well as exact KV cache support, TiDAR outperforms speculative decoding in measured throughput and surpasses diffusion models like Dream and Llada in both efficiency and quality. Most notably, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71× to 5.91× more tokens per second.」とスケールすることが確認できているのがすごい。