Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark [48.0]
ビデオ生成モデルは、Chain-of-Frames (CoF)推論を通じて、潜在的な世界シミュレータとして登場した。既存のベンチマークは、忠実さやアライメントに重点を置いており、CoFの推論を評価していない。我々は,認知科学と実世界のAI応用を基盤としたフレームワークであるGen-ViReを紹介する。
論文参考訳（メタデータ） (Mon, 17 Nov 2025 19:11:39 GMT)
ビデオ生成モデルを通じた因果関係の把握（world modelへの可能性）を評価するベンチマークの提案。「Gen-ViRe evaluates six core cognitive dimensions: (1) Perceptual, (2) Analogical, (3) Abstract, (4) Planning, (5) Spatial & Temporal, and (6) Algorithmic & Logical, with each dimension comprising four different sub-categories.」
「Sora-2 achieves the highest overall score (0.560), establishing the top tier with particularly strong performance in the most cognitively demanding domains: “Abstract Reasoning” (0.604), “Algorithmic & Logical” (0.472), and “Perceptual” (0.496). The second tier comprises three highly competitive models—Hailuo-2.3 (0.493), Wan-2.5 (0.490), and Veo-3.1 (0.486)—each exhibiting distinct specialized strengths. Hailuo-2.3 achieves the highest score in “Planning” (0.778), showcasing exceptional sequential decision-making capabilities, while Wan-2.5 leads in “Analogy” (0.500), excelling at analogical reasoning.」とモデルごとに特性がかなり異なるのが興味深い。
リポジトリはhttps://github.com/L-CodingSpace/GVR

コメントを残す

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル