Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks [108.2]
大規模モデルを用いたマルチモーダル空間推論タスクの包括的レビューを行う。我々は、視覚言語ナビゲーションやアクションモデルを含む、具体的AIの進歩についてレビューする。我々は,新しいセンサによる空間的理解に寄与する音声やエゴセントリックビデオなどの新たなモダリティを考察する。
論文参考訳（メタデータ） (Wed, 29 Oct 2025 17:55:43 GMT)
MLLMのサーベイ。
リポジトリはGitHub – zhengxuJosh/Awesome-Multimodal-Spatial-Reasoning: This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).

コメントを残す

コメントを残す コメントをキャンセル