HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation [38.6]
我々は32Kの実世界の画像質問対の総合的なベンチマークであるHumaniBenchを紹介する。 HumaniBenchは、公正性、倫理、理解、推論、言語の傾き、共感、堅牢性を含む7つのHuman Centered AI(HCAI)の原則を評価している。
論文参考訳（メタデータ） (Fri, 16 May 2025 17:09:44 GMT)
「HumaniBench probes seven HCAI principles—fairness, ethics, understanding, reasoning, language inclusivity, empathy, robustness—through seven diverse tasks that mix open- and closed-ended visual question answering (VQA), multilingual QA, visual grounding, empathetic captioning, and robustness tests.」というベンチマーク。商用モデルが優れた結果を出しているが、個別要素ではオープンなモデルが高スコアの場合もある。
プロジェクトサイトはHumaniBench: A Human-Centric Benchmark for Large Multimodal Models Evaluation

コメントを残す

コメントを残す コメントをキャンセル