Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark [124.0]
我々は、ビデオモデルがゼロショット推論器として機能する準備が整っているかどうかを実証研究する。私たちは、人気の高いVeo-3に注力しています。我々は,空間的,幾何学的,物理的,時間的,具体的論理を含む12次元にわたる推論行動を評価する。
論文参考訳（メタデータ） (Thu, 30 Oct 2025 17:59:55 GMT)
「Video models are zero-shot learners and reasoners – arXiv最新論文の紹介」という主張もあるが、異なるチームによる論文。「Our findings reveal that while current video models demonstrate promising reasoning patterns on short-horizon spatial coherence, fine-grained grounding, and locally consistent dynamics, they remain limited in long-horizon causal reasoning, strict geometric constraints, and abstract logic. Overall, they are not yet reliable as standalone zero-shot reasoners, but exhibit encouraging signs as complementary visual engines alongside dedicated reasoning models.」とのことで可能性を感じる結果ではある。
プロジェクトサイトはAre Video Models Ready as Zero-Shot Reasoners?

コメントを残す

コメントを残す コメントをキャンセル