Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.2]
本研究では、現在のAIモデルがマルチモーダルな構造化データに基づいて知識を考慮した推論を行うことができるかどうかを検討する。この目的のために設計された新しいデータセットであるMMTabQAを紹介する。我々の実験は、複数のテキストと画像の入力を効果的に統合し解釈する上で、現在のAIモデルに対する重大な課題を浮き彫りにしている。
論文参考訳（メタデータ） (Sun, 25 Aug 2024 15:17:43 GMT)
マルチモーダルなＱＡデータセットの提案。データ公開予定としているが現時点ではリポジトリ等へのリンクはなさそう。
「Closed-source models like GPT-4o and Gemini1.5 Flash outperform open-source models in multimodal tasks due to advanced training techniques and better integration of visual and textual data.」、「In text-only tasks, the performance gap between open-source and closed-source models narrows significantly, with open-source models like Llama-3 providing competitive results.」とのことで現時点ではマルチモーダルにおいてオープンなモデルは苦戦しているよう。

コメントを残す

コメントを残す コメントをキャンセル