Yume-1.5: A Text-Controlled Interactive World Generation Model

Yume-1.5: A Text-Controlled Interactive World Generation Model [78.9]
Methodは、単一の画像やテキストプロンプトから現実的でインタラクティブで連続的な世界を生成するように設計された新しいフレームワークである。メソッドは、キーボードベースの生成世界を探索するフレームワークを慎重に設計し、これを実現している。
論文参考訳（メタデータ） (Fri, 26 Dec 2025 17:52:49 GMT)
「we present Yume1.5, an interactive world generation model that enables infinite video generation from a single input image through autoregressive synthesis while supporting intuitive keyboard-based camera control.」、「The key innovations of Yume1.5 include: (1) a joint temporal-spatial-channel modeling approach that enables efficient long video generation while maintaining temporal coherence; (2) an acceleration method that mitigates error accumulation during inference; and (3) text-controlled world event generation capability achieved through careful architectural design and mixed-dataset training.」とのこと。動画生成系、world modelにつながる研究。夢、世界（GitHub – Lixsp11/sekai-codebase: [NeurIPS 2025] The official repository of “Sekai: A Video Dataset towards World Exploration”）とネーミングも面白い。
リポジトリはGitHub – stdstu12/YUME: The official code of Yume、モデルはstdstu123/Yume-5B-720P · Hugging Face

コメントを残す

コメントを残す コメントをキャンセル