Large Video Planner Enables Generalizable Robot Control

Large Video Planner Enables Generalizable Robot Control [117.5]
汎用ロボットは、様々なタスクや環境にまたがって一般化する意思決定モデルを必要とする。最近の研究は、マルチモーダル大言語モデル(LM)をアクション出力で拡張し、視覚-アクション(VLA)システムを構築することで、ロボット基盤モデルを構築している。本稿では,ロボット基礎モデル構築における主要なモダリティとして,大規模ビデオ事前学習を用いるための代替パラダイムについて検討する。
論文参考訳（メタデータ） (Wed, 17 Dec 2025 18:35:54 GMT)
「We present Large Video Planner (LVP), a 14-billion parameter video foundation model for embodiment planning. LVP generates videos as motion plans conditioned on one or a few scene frames and a text description of the task. We demonstrate that these generated motion plans can be successfully retargeted to dexterous robotic hands using open-source reconstruction and retargeting tools. Evaluations on third-party proposed tasks show evidence of task-level generalization, a capability limited in existing VLA models.」と動画をカギとするロボット用の行動計画モデルの提案。
関連手法の進化を見るに、有力なアプローチに思えなくもない。

コメントを残す

コメントを残す コメントをキャンセル