MM-IFEngine: Towards Multimodal Instruction Following

MM-IFEngine: Towards Multimodal Instruction Following [85.9]
高品質なイメージインストラクションペアを生成するパイプラインであるMM-IFEngineを提案する。 MM-IFInstruct-23kはSFT(Supervised Fine-Tuning)に適しているが、DPO(Direct Preference Optimization)のためにMM-IFDPO-23kとして拡張されている。また、MM-IFEvalは、困難で多様なマルチモーダル命令追従ベンチマークである。
論文参考訳（メタデータ） (Thu, 10 Apr 2025 17:59:12 GMT)
「the instruction-following ability of Multimodal Large Language Models」のベンチマークとモデル（公開モデルベース）の提案。商用モデルの強力さが目立つ。また、「DPO using MM-IFDPO-23k significantly surpasses SFT on MMIFInstruct-23k」は興味深い。
リポジトリはGitHub – SYuan03/MM-IFEngine: MM-IFEngine: Towards Multimodal Instruction Following

コメントを残す

コメントを残す コメントをキャンセル