MMGR: Multi-Modal Generative Reasoning – arXiv最新論文の紹介

MMGR: Multi-Modal Generative Reasoning [97.4]
本稿では,5つの推論能力に基づく基本的評価フレームワークMMGRを紹介する。 MMGRは、抽象推論(Abstract Reasoning)、体操ナビゲーション(Embodied Navigation)、物理コモンセンス(Physical Commonsense)の3つの領域にわたる生成的推論を評価する。主要映像モデル(Veo-3, Sora-2, Wan-2.2)と画像モデル(Nano-banana, Nano-banana Pro, GPT-4o-image, Qwen-image)をベンチマークする。
論文参考訳（メタデータ） (Wed, 17 Dec 2025 18:42:37 GMT)
「We argue that for video generation to evolve from mere image animation to genuine world modeling (Ha & Schmidhuber, 2018; LeCun, 2022), models must acquire foundational reasoning capabilities akin to human intuitive physics and cognition. Moving beyond superficial fidelity (Huang et al , 2024; Liu et al , 2024b), we propose a formal evaluation framework asking: Can a video model reason about the physical and logical constraints of the content it generates? Drawing on theories of core knowledge and cognitive development (Spelke & Kinzler, 2007; Lake et al , 2017), we posit that robust world simulation rests on five complementary pillars of reasoning:」とのこと。５つは下記の通り。
- Physical Reasoning
- Logical Reasoning
- 3D Spatial Reasoning
- 2D Spatial Reasoning
- Temporal Reasoning
リポジトリはZefan-Cai/MMGR · GitHub

コメントを残す

コメントを残す コメントをキャンセル