「Closed-source models like GPT-4o and Gemini1.5 Flash outperform open-source models in multimodal tasks due to advanced training techniques and better integration of visual and textual data.」、「In text-only tasks, the performance gap between open-source and closed-source models narrows significantly, with open-source models like Llama-3 providing competitive results.」とのことで現時点ではマルチモーダルにおいてオープンなモデルは苦戦しているよう。