OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning [72.6]
テキスト認識のための大規模バイリンガルテキスト中心ベンチマークであるOCRBench v2を紹介する。その結果,22 LMM中20 LMMは50点未満(合計100点)で,5種類の制限があることがわかった。
論文参考訳（メタデータ） (Tue, 31 Dec 2024 07:32:35 GMT)
MLLMを対象としたOCRベンチマーク、「After carefully benchmarking state-of-the-art LMMs on OCRBench v2, we find that 36 out of 38 LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, finegrained perception, layout perception, complex element parsing, and logical reasoning.」とのこと。
リポジトリはhttps://github.com/YuliangLiu/MultimodalOCR

コメントを残す

コメントを残す コメントをキャンセル