MuSLR: Multimodal Symbolic Logical Reasoning

MuSLR: Multimodal Symbolic Logical Reasoning [133.9]
マルチモーダルな論理的推論は、自律運転や診断などの高度な応用において重要である。形式論理規則を基礎としたマルチモーダルな記号論理的推論のための最初のベンチマーク Mu SLR を導入する。我々は,GPT-4.1のChain-of-Thought性能を14.13%向上させるモジュール型フレームワークであるLogiCAMを提案する。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 06:42:20 GMT)
Multimodal symbolic logical reasoningを対象とするベンチマークMuSLRの構築。またベースラインとしてモジュラー構成のLogiCAMを提案している。現在のフロンティアなモデルでも難しいベンチマークのよう。
改善のための「First, integrating dedicated symbolic modules is essential: the LogiCAM outperforms base VLMs precisely because it extracts multimodalities based on logic and embeds explicit symbolic reasoning steps. Second, existing VLMs struggle to align and fuse visual and textual information when performing formal logic; Future work should explore tighter multimodal integration, such as cross-modal architectures trained with logic-grounded objectives, to bridge this gap.」という指摘が興味深く、現行モデルは形式的な処理に苦労しているように見える。
リポジトリはMuSLR: Multimodal Symbolic Logical Reasoning

コメントを残す

コメントを残す コメントをキャンセル