How Far is Video Generation from World Model: A Physical Law Perspective

How Far is Video Generation from World Model: A Physical Law Perspective [101.2]
OpenAIのSoraは、物理法則に準拠した世界モデルを開発するためのビデオ生成の可能性を強調している。しかし、ビデオ生成モデルが人間の先行しない視覚データから純粋にそのような法則を発見する能力に疑問を投げかけることができる。本研究は,3つの主要なシナリオ – 分布内,分布外,一般化 – について評価する。
論文参考訳（メタデータ） (Mon, 04 Nov 2024 18:53:05 GMT)
世界シミュレータとしても期待されるビデオ生成についての詳細な評価。「Further experiments reveal two key insights about the generalization mechanisms of these models: (1) the models fail to abstract general physical rules and instead exhibit “case-based” generalization behavior, i.e., mimicking the closest training example; (2) when generalizing to new cases, models are observed to prioritize different factors when referencing training data: color > size > velocity > shape.」とのことで、なかなか厳しい評価に思える。さらには「The findings indicate that scaling alone cannot address the OOD problem, although it does enhance performance in other scenarios.」とのことで、簡単な問題ではないことが分かる。
論文中にも「ニュートンが運動の3法則を定式化するのに何世紀もかかった」という記載と「一方で子供でも直観的な予測は可能」との記載があるが、この手の能力がAIに実現できるかはいろいろと興味深い。
プロジェクトサイトはHow Far is Video Generation from World Model: A Physical Law Perspective

コメントを残す

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル